Chinese characters in Mysql: Dont forget the collation
I recently conquered another oddity in using chinese characters in MySql. Apparently, it’s not enough to set the database’s character set to UTF-8. You also need to set the collation to a utf-8 collation. You might think the collation is only important for sorting, but theres’ more to it. If you have selected a case-insensitive collation, then it is also used to determine equality. If the collation doesn’t understand character boundaries properly, then you run into strange problems. The database was convinced two very different chinese characters were the same because their UTF-8 encodings when interpretted as 1252 had similar characters, maybe only differing in case or accent.
So if you’re having trouble with unicode characters in mysql, try running this command:
mysql> alter database chinesedb collate utf8_bin
Next step is figuring out how to put this into an ActiveRecord migration.
Leo is a professional geek who looks forward to the robots taking over. For more current, less coherent thoughts, follow him on twitter