Picture of stacked suitcases

Migrating Alfresco from PostgreSQL to MySQL

Want to migrate Alfresco metadata from PostgreSQL to MySQL because you feel better with the latter ? This post should help you get there.

The Alfresco installer is the download of choice for people willing to give the system a try. The interactive procedure just asks a few questions and you are ready to go. It includes the choice to install a PostgreSQL database. If what you want is trying it out, you may not care about the database or other details. Most likely what you want is a working system – quickly. This is all fine, but you should at least keep in mind that you may want to get your data back out.

One scenario I have encountered more than once is that people start using their “try out” installation more seriously the more they start trusting it. All of a sudden they realize they face new requirements. Monitoring and control may be among, but backup is usually these most urgent. :)

Backup : The typical easy way

If you are responsible for the system and realized it may be a good idea to set up a backup you make have googled your way to the Alfresco Wiki Backup and Restore page. At one point you have to decide how to backup the data living in the RDBMS. Googling a solution for PostgreSQL is easy. Use pg_dump and your good to go. At the end of the day you may have ended up with a shell script executing the following steps:

  • Stop tomcat
  • pg_dump alfresco database
  • tar up lucene index and the contentstore
  • Start tomat

There is nothing wrong with this approach, given

  • You can shut down the system for some time
  • You are willing to accept “little” loss of data (you backed up on point in time)

The Wiki Page also outlines the general online/hot procedure, but definitely leaves questions and exercises for the reader on the way to a full custom backup/recovery strategy.

DBA 101 available ?

The average admin should be able to restore from such a backup – even if she never heard of PostgreSQL before. I have the feeling there are quite a few admins of this kind. There surely are other situations requiring more advanced DB(A) skills to recover from. I have to say I am suprised how little some people care about their data.

On the other hand – at least around me – people tend to know MySQL much better than PostgreSQL. If I remember correctly, Alfresco picked PostgreSQL due to licensing issues with MySQL raised by Oracle, so I am not blaming them here.

PostgreSQL to MySQL Mini-Howto

The following worked for me moving from PostgreSQL to MySQL – your mileage may vary. ;)

Export MySQL schema structure from an Alfresco installation running the same version as the PostgreSQL based installation your want migrated:

mysqldump -d --skip-add-drop-table --user=alfresco --password=alfresco alfresco > alfresco-create-tables.sql

Dump PostgreSQL alfresco database

pg_dump -U alfresco alfresco -a --inserts --format p -f alfresco-pg.sql
php pg2mysql_cli.php alfresco-pg.sql alfresco-mysql.sql InnoDB

Apply some case-/binary fixes on the export (Update September, 7th).

perl -ne "
     s/insert into (act_[a-z_]+)/\"insert into \".uc(\$1)/ie;
     s/insert into (jbpm_[a-z_]+)/\"insert into \".uc(\$1)/ie;
     s/'\\\{1,2}x([0-9a-f]+)'/unhex('\$1\')/g;
     print" <alfresco-mysql.sql >alfresco-mysql-fix.sql

Execute initialize new MySQL schema and import the converted data.

source alfresco-create-tables.sql;
SET foreign_key_checks=0;
source alfresco-mysql-fix.sql;
SET foreign_key_checks=1;

Finally, make sure to change the database settings in alfresco-global.properties

The ultimate question now is how to set up a super sophisticated backup with MySQL. Just replacing pg_dump with mysqldump is not a real win. I am working on a solution right now.

If you have experience setting up an advanced solution I’d be interested to hear about it.

Update September, 14th: Beware of charactersets.

Below, there is a download including pg2mysql-1.9.tar.bz2 and some DDL:

Download Alfresco PostgreSQL to MySQL

References

Andreas Steffan
Pragmatic ? Scientist and DevOps Mind @ Contentreich. Believes in Open Source, the Open Web and Linux. Freelancing in DevOps-, Cloud-, Kubernetes, JVM- and Contentland and speaks Clojure, Kotlin, Groovy, Go, Python, JavaScript, Java, Alfresco and WordPress. Built infrastructure before it was cool. ❤️ Emacs.

15 thoughts on “Migrating Alfresco from PostgreSQL to MySQL”

  1. Because using the installer your installation can’t be scaled out in any way. The suggested way to install any ECM system is to have separated components for the storage (file system and database) and a dedicated instance of the application that can be embedded (JSE) or deployed as a standalone app (JEE). In this way you can manage your own database installed in your own machine with your DBA. This because you need a DBA to scale out with your DBMS and it could depend on how you are using Alfresco. That’s why when you start Alfresco for the first time you will find SQL scripts in the temp folder of your application server. These scripts can be taken from DBAs for improving tables in a better way for highend tuning ;).

    Then if you need to install a cluster again with the installer you have many issues here and the best way is to have separated components to achieve your specific goals.

    Another issue is that you have classloader limitations with the installer and you could have issues trying to install your custom AMPs.

    Again I suggest to avoid the usage of the installer for production environments. Please use it only for demo. Hope this helps.

  2. Sorry Piergiorgio, I don’t quite agree. :)

    Scaling out is a whole different story. Given resources I am sure it does not really make a difference what kind of installation you start scaling out from. Sure a bad initial choice of the RDBMS might cause more effort than a good one, but I would not call PostgreSQL a bad choice.

    The installer saves you from various pitfalls – OpenOffice-/swftools version, Tomcat-/alfresco-global.properties configuration and ImageMagick to name a few. These are imho things you want to just work.

    Besides, it is very easy to change the RDBMS after the installer is finished. Changing the filesystem backing the contentstore is even easier. Personally I’d pick the installer even if I were initially installing on another appserver. Regarding the SQL left in the temp folder, I seriously doubt it is meant to be checked by a DBA.

    You have to give people something stable and reliable to start seriously quickly. You cannot tell them they have to worry about “gazillons content node scaling concerns” beforehand. :)

    Finally, if the installer is really meant to be used for demo- and development purposes only, I would strongly suggest to issue a strong warning at the very beginning of the installation procedure. Fact is that people – especially small businesses – do use this kind of installation in production and need support. If I were in their situation, I would get quite upset if you tell me the setup cannot be tweaked and data is effectivly lost in the long run. Thanks god this is not true. ;)

  3. Hello, I tried following your procedures, but upon starting Alfresco, I now get errors on some pages indicating An error has occured in the Share component: /share/service/components/dashlets/wiki.
    It responded with a status of 500 – Internal Error. If I browse the repository everything is empty/shows no content. If I try clicking wiki links, the page loads but has no content in it.

    Any suggestions or ideas to try?

  4. Its impossible to tell what went wrong given just your comment, Josh.

    I would try again looking for errors at each step. Most likely, you have had errors during the initialization of the mysql database. Try putting the last four lines into a script and feed this script into mysql. Capture the output and look for errors like so:

    mysql … <four-stmts.sql >out.log 2>&1

    regards
    Andreas

  5. Hi Andreas!

    Thanks for the quick response! So I was able to debug the mysql inserts like you suggested and everything actually inserts ok.

    So next I looked in catalina.out log for any errors for example when trying to load a wiki page. Here is a link to the pastebin I get when loading a page: http://pastebin.com/RVqdeUCE

    From that it seems, my guess would be that something didn’t translate (BLOB) from PostgreSQL into MySQL properly and is now causing an issue when Alfresco tries to read it.

    Any thoughts/suggestions? Really appreciate your help! Trying to avoid having to recreate all the content and reinstalling just to get onto mysql.

  6. Hi Josh,

    you are right, it indeed fails for binary fields – pg2mysql is not perfect it seems. :) This problem has not hit me so far.

    Anyways, the following worked for me on a small sample:


    perl -ne "s/'\\\x([0-9a-f]+)'/unhex('\$1\')/g;print" <alfresco-mysql-fix.sql >alfresco-mysql-binfix.sql

    Try importing this instead of alfresco-mysql-fix.sql.

    Afterwards, you can compare the output of:


    select node_id,list_index,qname_id,md5(serializable_value) from alf_node_properties where serializable_value is not null order by node_id,list_index,qname_id;

    to see whether values are identical. Please let me know if this helps.

  7. Hi Andreas:

    I tried running your perl code to modify the file but it ends up resulting in the exact same output file as what I input.

    I am wondering if I need to revisit the pg2sql script to see what options are there to make the output more correct.

    From your SQL code above, the output of md5(serializable_value) is a blob but if I view it as Text then I can see the md5 hash.

    But my first guess is the perl snippet you had should be doing something rather than nothing (which would be why I get the same result ;-)

    Thanks for all your help!
    Josh

  8. Ok, so one mail later we figured out postgres exports binary as either ‘\x…’ or ‘\\x…’ and we want unhex(‘…’) for mysql. The perl snippet above is tweaked and should work fine now. Sorry for the inconvenience. ;)

  9. Hi Stephan,
    Would it be possible to get there by exporting stores as ACPs and reimporting them in a different instance, after having copied alf_data as well?

    Thanks
    Claudio

  10. Hi Claudio,

    it may be possible, but I would not even dare to try if what you want is to resume exactly (on MySQL) where you suspended (on postgreSQL) regarding data. Reasons include:

    • ACP Export/Import does not support replication of version history (and security e.g. ACLs)
    • Various behaviours setting metadata usually kick in during the import
    • You need a bootstrapped repo for the import. An ACP import forces you to worry about dealing with uuidBinding (i.e CREATE_NEW, REMOVE_EXISTING, REPLACE_EXISTING, UPDATE_EXISTING, and THROW_ON_COLLISION)

    regards
    Andreas

  11. Hey, is there any guide like this to do the inverse procedure ?
    I have a MySQL database ( from an old 3.x.x installation ), but now I want to use the PostgreSQL DB.
    I’ve tried so far to just dump the data, and introduce it, but I got a lot of errors about the data introduction.

  12. Dear Andreas, thnx very much for this script. What you describe in your intro (trying in out-of-the-box and then relying on it) is exactly what happened to me. But with the mini-howto I managed to get alfresco up-n-running with MySQL.

    kind regards, –Edgar

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert