dom_xml_dataset_generators
dom_xml_dataset_generators copied to clipboard
Suggestion: Automatically normalise archive names
As DoM only supports ASCII/windows-filesystem-compatible filenames.
This is gonna be a little bit tricky, specially for Cyrillic and Asian strings. Special characters out of the ASCII range can just be removed or replaced with something else, I guess.
If there's some sort of library or module to handle string romanization, it'd be the best course of action.
Sometimes its just something like a long-dash, or trademark sign - those can be converted to short dat and removed, respectively. But for Cyrillic and Asian scripts, yeah a romanisation library would need to be used - they do exist. Additionally, the user could pass in the No-Intro dat file, so that it could re-use already-romanised titles, in cases where the base title id already exists in the dat file.