truecase
truecase copied to clipboard
Possible Feature: Use lambda function for out_of_vocabulary_token_option
Let me know what you think of allowing users to specify their own lambda func if they aren't satisfied with the out of vocab options.
I can work on this in my fork and create a PR.
Yes, we can do that.
I would prefer extracting the logic to a member function out_of_vocabulary_handler
and adding instructions to the readme on how users can override it with their own custom implementation.
What do you think?
Yes, that's good. I'll work up an implementation.
In addition to out_of_vocabulary
, an out_of_dictionary
option could also be useful in a later update. This could be an early-stage way to differentiate between names and words that are simply not in the vocabulary
For example: "hip-hop" is not in vocabulary, but is certainly a word. I would want it in lowercase.
However, my name (Keshav) is not in the vocabulary and won't be found in a dictionary. I'd want to capitalize "Keshav."
This certainly won't work for all names, as some names are words in the dictionary. eg: "Trump"
Another thing to consider:
If the first word is classified as "out_of_vocabulary", then should we capitalize it, or just go along with the user's out_of_vocabulary_token_option
.
Currently, it is the latter; however, I think we should capitalize it.