python-adagram
python-adagram copied to clipboard
Updated model.py
The disambiguate task was working on a the individual letters of the words of the whole sentence. I have changed it now and it would be running the word2id on the words as a whole rather than the individual letters of the words.
@srijan-mishra ah, thanks for the PR! Actually, I should have made it more clear that context should be already tokenized. I would like to leave tokenization outside of "disambiguate", as it can be different for different languages, and it's important for tokenization to be the same in training and inference.
So instead of doing tokenization via split
, I would like to document that context should already be tokenized.
Awesome!
Great work here :)
@srijan-mishra great work with figuring it out!
Btw, just though that it would be also possible to raise an error if a string is passed instead of a list, so if someone misses the docs, they'll get a clear error message.