chars2vec
chars2vec copied to clipboard
Begging for the training data
Hi, authors, I think you have done a very interesting job, but to train such a model requires a large set of similar words, How did you construct it? Or, could you provide your training data for us?
Hi @LuMelon, if you are looking for English language you can find many good resources to use is as a training dataset. You may check:
- wikidata
- One Billion Word
And many other datasets. If you are targeting closed domain application or different languages, I would recommend to use
Where you can build your own dataset for similar words for more than 100 languages.
If you could elaborate on what exact problem are you solving, maybe one can help you with the specific dataset you are looking for.
I have the same question. What corpus did you use for training the language model? How did you construct the pairs? Did you manually construct the pairs? Or did you use a context window similar to Word2Vec using Keras' skipgrams function?
Do you have data generator from given corpus?
In for the training corpus. Why not share it?
thank you for the model , creators . I sincerely apologize if my doubt is wrong because im new to this domain . I have seen models whenever they are called uses the "predict" method and "fit" during training but this model doesn't why is that so ? If i have a scenario to match with a particular word how do i use it as an argument for prediction ? like i have "drawing number " as the word and i need to see the similarity between "drawing reference " how to do it ? thank you so much