Published Wednesday 12th July 2023
Almost every time we're migrating a WordPress website from one server to another, the database import initially fails with the following error:
Unknown collation: ’utf8mb4_unicode_520_ci’
The reason for this is that since WordPress 4.2, their database tables have been utf8mb4 encoded by default, because it allows for 4byte characters instead of just 3byte and this is important for internationalisation as it means the full UTF8 unicode charset is available, and thus text in any language can be stored which is obviously important for a large platform like WordPress. The original MySQL UTF8 charset only supported 3byte characters hence the newer 4byte variant being suffixed with mb4. In other words, it's the utf8mb4_unicode part of the utf8mb4_unicode_520_ci collation which the MySQL developers wanted this collation for.
The suffix, _ci, indicates that the collation is case insensitive. Some collations are _cs and thus case sensitive, and this affects how SQL lookup queries work. So the fact that this particular collation is suffixed with _ci is important too.
The remaining part, _520, represents which version of the Unicode Collation Algorithm (UCA), is used. Collations that don't include a number here use UCA version 4, so _520 means that this collation uses UCA version 5.2.0. This is where there's a common problem, because utf8mb4_unicode_520_ci , the 4 byte UTF8 case insensitive collation based on the 5.2.0 version of the UCA, was added in MySQL 5.6 and many Linux servers still run either MySQL 5.5, or MariaDB 5.5 which simply doesn't recognise this collation. CentOS 7.9 for example, doesn't hit EOL until June 30th 2024 so many, many servers still run it, and CentOS 7.9 still ships MariaDB 5.5 as standard.
Fortunately the fix is simple. Since it's the utf8mb4_unicode and _ci parts that are important for WordPress to function properly, it's usually perfectly safe to swap out the utf8mb4_unicode_520_ci collation for the utf8mb4_unicode_ci collation, which is identical in every way except for being UCA version 4, and has been built in to MySQL since version 5.5.3.
In short, even if you don't have utf8mb4_unicode_520_ci collation support, you almost certainly will have utf8mb4_unicode_ci support and this should be a safe switch, so a quick find + replace of the MySQL database dump file before importing to the new server is all you need. On Linux, you can do this with the sed command:
sed -i database-dump.sql -e 's/utf8mb4_unicode_520_ci/utf8mb4_unicode_ci/g'
Or just open the file in your favourite text editor to replace every instance of utf8mb4_unicode_520_ci with utf8mb4_unicode_ci and hit save.
If your version of MySQL is even older than 5.5.3, you won't have any of the utf8mb4 charset collations available and will need to instead replace with utf8_unicode_ci but be aware that this might cause problems if any of your content includes characters that require 4bytes of space. You should at this point, be looking at upgrading your servers ideally.
Blog posts are written by individuals and do not necessarily depict the opinions or beliefs of QWeb Ltd or its current employees. Any information provided here might be biased or subjective, and might become out of date.
Nobody has commented yet.
Your email address is used to notify you of new comments to this thread, and also to pull your Gravatar image. Your name, email address, and message are stored as encrypted text. You won't be added to any mailing list, and your details won't be shared with any third party.