Last phrase punctuation
First of all. Thank you for putting this project together, it's incredible and incredibly useful.
I've noticed that the last phrase or sentence in a block of text is not having punctuation added. Do you have any advice on how to fix this?
Thank you
Hi @tcollins590 I guess I would personally add a "post-processing" step after applying this model to fix this type of error. If you have a file called punc.txt with the punctuation added, then you could do something like cat punc.txt | sed 's/\([a-zA-Z]\)$/\1./g' > new_punc.txt. This would add a period at the end of any sentence that ends with a letter (lower or upper case). It's not perfect obviously... but it would work in most cases.
Hi! I have an idea for the fix, but I'll have more free time in a few months to implement it.
The idea is to change the part in punctuator.py where the model selects the punctuation with highest probability:
p_i = np.argmax(y_t.flatten())
by adding a mask that sets the probabilities of non-end-of-sentence punctuations (plus the no-punctuation class) to zero if we have reached the end of input text:
p_i = np.argmax(y_t.flatten() * eos_mask)
This would force the model to choose between period, question mark or exclamation.
Hello @ottokart it would be great if you would manage to implement this. Thanks