chars2vec Begging for the training data

Begging for the training data

Open LuMelon opened this issue 4 years ago • 7 comments

Aug 11 '19 02:08 LuMelon

Hi, authors, I think you have done a very interesting job, but to train such a model requires a large set of similar words, How did you construct it? Or, could you provide your training data for us?

Aug 11 '19 02:08 LuMelon

Hi @LuMelon, if you are looking for English language you can find many good resources to use is as a training dataset. You may check:

wikidata
One Billion Word

And many other datasets. If you are targeting closed domain application or different languages, I would recommend to use

SynNets

Where you can build your own dataset for similar words for more than 100 languages.

Sep 04 '19 23:09 SupervisionT

If you could elaborate on what exact problem are you solving, maybe one can help you with the specific dataset you are looking for.

Sep 10 '19 12:09 skt7

I have the same question. What corpus did you use for training the language model? How did you construct the pairs? Did you manually construct the pairs? Or did you use a context window similar to Word2Vec using Keras' skipgrams function?

Oct 15 '19 20:10 phongtheha

Do you have data generator from given corpus?

Nov 13 '19 13:11 mustfkeskin

In for the training corpus. Why not share it?

Dec 13 '19 20:12 rjurney

thank you for the model , creators . I sincerely apologize if my doubt is wrong because im new to this domain . I have seen models whenever they are called uses the "predict" method and "fit" during training but this model doesn't why is that so ? If i have a scenario to match with a particular word how do i use it as an argument for prediction ? like i have "drawing number " as the word and i need to see the similarity between "drawing reference " how to do it ? thank you so much

Mar 19 '21 07:03 jjustinm4

chars2vec chars2vec copied to clipboard

Begging for the training data

chars2vec
chars2vec copied to clipboard