libchewing icon indicating copy to clipboard operation
libchewing copied to clipboard

Validate tsi.src and phone.cin

Open kcwu opened this issue 8 years ago • 5 comments

tsi.src and phone.cin were often broken in the past. Not only sorting order, sometimes the syntax is bad (missing frequency, extra space, illegal bopomofo, etc.)

We should validate them in CI to keep them in good state.

Before somebody write the validation code, checking the sorting order seems a good start. cc @PeterDaveHello

kcwu avatar Mar 30 '16 08:03 kcwu

I think we already has some checks implemented in https://github.com/chewing/libchewing/blob/master/src/tools/init_database.c for phone.cin and tsi.src. Not sure if any check is missing for these two.

czchen avatar Mar 30 '16 11:03 czchen

init_database.c is tolerance to errors and more robust. For example,

  • init_database.c allows delimiter has more than one space or trailing space.
  • init_database.c allows illegal bopomofo sequence like ˊ.
  • init_database.c allows negative numbers or even non-decimal 0xab
  • init_database.c allows blank line

I'd like to have stricter validator.

kcwu avatar Mar 30 '16 12:03 kcwu

@kcwu, do you think we can just use a stricter parser in init_database.c, or we really need a separate validator?

czchen avatar Mar 30 '16 16:03 czchen

These two definitely should be rejected by init_database.c

  • init_database.c allows illegal bopomofo sequence like ˊ.
  • init_database.c allows negative numbers or even non-decimal 0xab

For blank line and extra spaces (and sorting order), I'm not sure should we enforce or not.

kcwu avatar Mar 31 '16 16:03 kcwu

I can't understand the above descriptions of the situation that init_database.c should avoid. The first feature illegal bopomofo sequence means the sequence only contains ˋ ˊ ˇ?? The second feature where are the negative numbers and the non-decimal numbers??

Billy4195 avatar May 20 '17 03:05 Billy4195