libchewing
libchewing copied to clipboard
Validate tsi.src and phone.cin
tsi.src and phone.cin were often broken in the past. Not only sorting order, sometimes the syntax is bad (missing frequency, extra space, illegal bopomofo, etc.)
We should validate them in CI to keep them in good state.
Before somebody write the validation code, checking the sorting order seems a good start. cc @PeterDaveHello
I think we already has some checks implemented in https://github.com/chewing/libchewing/blob/master/src/tools/init_database.c for phone.cin
and tsi.src
. Not sure if any check is missing for these two.
init_database.c is tolerance to errors and more robust. For example,
- init_database.c allows delimiter has more than one space or trailing space.
- init_database.c allows illegal bopomofo sequence like
ˊ
. - init_database.c allows negative numbers or even non-decimal 0xab
- init_database.c allows blank line
I'd like to have stricter validator.
@kcwu, do you think we can just use a stricter parser in init_database.c
, or we really need a separate validator?
These two definitely should be rejected by init_database.c
- init_database.c allows illegal bopomofo sequence like
ˊ
. - init_database.c allows negative numbers or even non-decimal 0xab
For blank line and extra spaces (and sorting order), I'm not sure should we enforce or not.
I can't understand the above descriptions of the situation that init_database.c should avoid.
The first feature illegal bopomofo sequence means the sequence only contains ˋ ˊ ˇ
??
The second feature where are the negative numbers and the non-decimal numbers??