私が歌川です

@utgwkk が書いている

MySQL5.7のN-gramインデックスを用いた全文検索で絵文字単体が検索できない

LIKEによるマッチングには引っかかるけど,MATCHによる全文検索では引っかからないケースがある. ずっと困っているので,正しいクエリの書き方とか設定とかあったら教えてください.助けてくれ…….

mysql> set names 'utf8mb4' collate 'utf8mb4_bin';
Query OK, 0 rows affected (0.00 sec)

mysql> select * from test_table;
+----+-----------+--------------------------------------------------+
| id | title     | description                                      |
+----+-----------+--------------------------------------------------+
|  1 | test      | test data.                                       |
|  2 | 日本語    | これは日本語のデータです                         |
|  3 | emojis    | there are various emojis such as 😇, 🙆 etc.         |
+----+-----------+--------------------------------------------------+
3 rows in set (0.00 sec)

mysql> select * from test_table where match (description) against ('😇*' in boolean mode);
Empty set (0.00 sec)

mysql> select * from test_table where description like '%😇%';
+----+--------+--------------------------------------------------+
| id | title  | description                                      |
+----+--------+--------------------------------------------------+
|  3 | emojis | there are various emojis such as 😇, 🙆 etc.         |
+----+--------+--------------------------------------------------+
1 row in set (0.00 sec)

mysql> select * from test_table where match (description) against ('😇,' in natural language mode);
+----+--------+--------------------------------------------------+
| id | title  | description                                      |
+----+--------+--------------------------------------------------+
|  3 | emojis | there are various emojis such as 😇, 🙆 etc.         |
+----+--------+--------------------------------------------------+
1 row in set (0.00 sec)

mysql>

my.cnf の内容です.

[mysqld]
character-set-server = utf8mb4
innodb_ft_min_token_size = 1
innodb_ft_enable_stopword = off
ngram_token_size = 2

query_cache_type = 1
query_cache_size = 64M
query_cache_limit = 8M

innodb_buffer_pool_size = 256M
innodb_flush_log_at_trx_commit = 0
innodb_flush_method=O_DIRECT
innodb_log_buffer_size = 8M
innodb_log_file_size = 128M

table_open_cache = 32
max_connections = 128

[client]

再現用のmysqldumpです.

gist.github.com