punctuator2 icon indicating copy to clipboard operation
punctuator2 copied to clipboard

Last phrase punctuation

Open ghost opened this issue 8 years ago • 3 comments

First of all. Thank you for putting this project together, it's incredible and incredibly useful.

I've noticed that the last phrase or sentence in a block of text is not having punctuation added. Do you have any advice on how to fix this?

Thank you

ghost avatar Nov 03 '17 17:11 ghost

Hi @tcollins590 I guess I would personally add a "post-processing" step after applying this model to fix this type of error. If you have a file called punc.txt with the punctuation added, then you could do something like cat punc.txt | sed 's/\([a-zA-Z]\)$/\1./g' > new_punc.txt. This would add a period at the end of any sentence that ends with a letter (lower or upper case). It's not perfect obviously... but it would work in most cases.

migueljette avatar Nov 07 '17 20:11 migueljette

Hi! I have an idea for the fix, but I'll have more free time in a few months to implement it. The idea is to change the part in punctuator.py where the model selects the punctuation with highest probability: p_i = np.argmax(y_t.flatten()) by adding a mask that sets the probabilities of non-end-of-sentence punctuations (plus the no-punctuation class) to zero if we have reached the end of input text: p_i = np.argmax(y_t.flatten() * eos_mask) This would force the model to choose between period, question mark or exclamation.

ottokart avatar Feb 25 '18 17:02 ottokart

Hello @ottokart it would be great if you would manage to implement this. Thanks

rhamnett avatar Apr 16 '19 15:04 rhamnett