punctuator2
punctuator2 copied to clipboard
post punctuator steps?
Question about steps after this: cat data.dev.txt | python punctuator.py <model_path> <model_output_path>
We get a text file have the result with ',COMMA' and '.PERIOD' etc inside. To generate final result, we assume following steps:
- replace punc with real punc
- Capitalize the previous word after .PERIOD'
Is this the right understanding?
Yes, that's about correct (?QUESTIONMARK and !EXCLAMATIONMARK should also be taken into account).
I added a conversion script with the last commit. You can use it like this:
python convert_to_readable.py <model_output_path> <readable_output_path> <1/0 - add newlines at end-of-sentence>
Thanks a lot! Just added one more 'period' at the last line.
@cozec can you help me train a model with a processed file that I have? Thanks.