Duplicate term-id records in Dictionary

 Problem

For unknown reason Dictionary can contain multiple Terms sharing same ‘term-id’ value. This can cause unpredicted errors during import synchronization, or during editing of Concept.

 Solution

To solve this case, we need to update values stored in DB, to achieve this do next steps:

  1. First double check if Dictionary has duplicates by executing next SQL, don’t to forget to provide actual Dictionary id instead of <dictionary_oid>:

    SELECT a.* FROM tw_term a JOIN (SELECT term_id, dictionary_oid, COUNT(*) FROM tw_term GROUP BY term_id, dictionary_oid HAVING count(term_id) > 1 and dictionary_oid = '<dictionary_oid>') b ON a.term_id = b.term_id AND a.dictionary_oid = b.dictionary_oid ORDER BY a.term_id
  2. If you see duplicate records and want to continue then go to step 3

  3. Stop Tomcat and backup database

  4. To update duplicate records, we need to launch first script which will generate update scripts, don’t to forget to provide actual Dictionary id instead of <dictionary_oid>:

    SELECT (CONCAT('UPDATE tw_term AS a SET a.term_id = CONCAT(\'', CONCAT(REGEXP_REPLACE(a.term_id, '(\\d+-)\\d+', '$1')), '\', (SELECT MAX(CONVERT(REGEXP_REPLACE(d.term_id, ''\\\\d+-(\\\\d+)'', ''$1'') , UNSIGNED INTEGER)) + 1 FROM (SELECT b.* FROM (SELECT c.term_id FROM tw_term AS c WHERE c.concept_oid = \'', a.concept_oid, '\') AS b) AS d)) WHERE a.oid = \'', a.oid, '\';')) AS script FROM tw_term a JOIN (SELECT term_id, dictionary_oid FROM tw_term GROUP BY term_id, dictionary_oid HAVING count(term_id) > 1 and dictionary_oid = '<dictionary_oid>') b ON a.term_id = b.term_id AND a.dictionary_oid = b.dictionary_oid ORDER BY a.term_id;

    In case when there is a lot of records you may save results to file via mysql cli:

    mysql -h localhost -u user -p --skip-column-names -e "SELECT (CONCAT('UPDATE tw_term AS a SET a.term_id = CONCAT(\'', CONCAT(REGEXP_REPLACE(a.term_id, '(\\\\d+-)\\\\d+', '\$1')), '\', (SELECT MAX(CONVERT(REGEXP_REPLACE(d.term_id, ''\\\\d+-(\\\\d+)'', ''\$1'') , UNSIGNED INTEGER)) + 1 FROM (SELECT b.* FROM (SELECT c.term_id FROM tw_term AS c WHERE c.concept_oid = \'', a.concept_oid, '\') AS b) AS d)) WHERE a.oid = \'', a.oid, '\';')) AS script FROM tw_term a JOIN (SELECT term_id, dictionary_oid FROM tw_term GROUP BY term_id, dictionary_oid HAVING count(term_id) > 1 and dictionary_oid = '<dictionary_oid>') b ON a.term_id = b.term_id AND a.dictionary_oid = b.dictionary_oid ORDER BY a.term_id;" <database_name> > scripts.sql

    Don’t forget to change <dictionary_oid> and <database_name>. Please note that for bash script has been added extra escape symbols \.

  5. Copy generated scripts and launch it in your favorite SQL editor program.
    In case of script file you may run it, like this: mysql -h localhost -u user -p <database_name> < scripts.sql

  6. Start Tomcat and verify results.

 

If Dictionary contains a lot of Terms it can take a significant amount time to scan all records for duplicates.

 Related articles