vertical-medical
vertical-medical copied to clipboard
[RFC][11.0] Translation of Imported Data (ICD-10, etc.)
The current plan in #134 is to outright remove the giant data files that we have in oe_medical_emr_data
in favor of import systems for dynamic update of data. These XML files are more than 10 MiB each & are updated yearly, so this is essentially the only way to go about handling these processes in a forward-efficient manor.
This brings up an interesting problem that I will not be solving in that PR - translations.
Most of the code data files are available in other languages, so obtaining translations for the data itself won't be an issue. What will though is when two languages come into play for the same datasets. For example:
- English ICD-10-CM is imported
- French ICD-10-CM is needed
In order to import the French ICD-10 data, the English data would either need to be replaced or duplicated if using standard record creates/writes. This is because records are naive of languages.
My only idea is to convert all translatable text in imports into a unique identifier string for the field (uuid-4 or something). The import would then create/update a record of the non-translatable data + all of the identifier strings. It would then add the actual text into ir.translation
so that the identifiers are translated by the system.
The disadvantage here is that translations essentially become impossible to maintain manually, although I'm not sure if this was an option anyways given the size of the data. There will also likely be data duplications due to the identifier system not taking into account word lemmas.
I think this disadvantage is negligible though and outweighed the advantage of not storing and maintaining these giant XML files in source control.
I'm wondering if anyone has some ideas or strategies for the translations that I'm not thinking of?