python-adagram Updated model.py

Updated model.py

Open srijan-mishra opened this issue 7 years ago • 3 comments

The disambiguate task was working on a the individual letters of the words of the whole sentence. I have changed it now and it would be running the word2id on the words as a whole rather than the individual letters of the words.

May 09 '17 08:05 srijan-mishra

@srijan-mishra ah, thanks for the PR! Actually, I should have made it more clear that context should be already tokenized. I would like to leave tokenization outside of "disambiguate", as it can be different for different languages, and it's important for tokenization to be the same in training and inference.

So instead of doing tokenization via split, I would like to document that context should already be tokenized.

May 09 '17 08:05 lopuhin

Awesome!

Great work here :)

May 09 '17 08:05 srijan-mishra

@srijan-mishra great work with figuring it out!

Btw, just though that it would be also possible to raise an error if a string is passed instead of a list, so if someone misses the docs, they'll get a clear error message.

May 09 '17 08:05 lopuhin

python-adagram python-adagram copied to clipboard

Updated model.py

python-adagram
python-adagram copied to clipboard