punctuator2 icon indicating copy to clipboard operation
punctuator2 copied to clipboard

post punctuator steps?

Open cozec opened this issue 7 years ago • 3 comments

Question about steps after this: cat data.dev.txt | python punctuator.py <model_path> <model_output_path>

We get a text file have the result with ',COMMA' and '.PERIOD' etc inside. To generate final result, we assume following steps:

  1. replace punc with real punc
  2. Capitalize the previous word after .PERIOD'

Is this the right understanding?

cozec avatar Jun 02 '17 21:06 cozec

Yes, that's about correct (?QUESTIONMARK and !EXCLAMATIONMARK should also be taken into account).

I added a conversion script with the last commit. You can use it like this: python convert_to_readable.py <model_output_path> <readable_output_path> <1/0 - add newlines at end-of-sentence>

ottokart avatar Jun 03 '17 14:06 ottokart

Thanks a lot! Just added one more 'period' at the last line.

cozec avatar Jun 07 '17 01:06 cozec

@cozec can you help me train a model with a processed file that I have? Thanks.

acerock6 avatar Aug 04 '18 08:08 acerock6