chars2vec icon indicating copy to clipboard operation
chars2vec copied to clipboard

Begging for the training data

Open LuMelon opened this issue 4 years ago • 7 comments

LuMelon avatar Aug 11 '19 02:08 LuMelon

Hi, authors, I think you have done a very interesting job, but to train such a model requires a large set of similar words, How did you construct it? Or, could you provide your training data for us?

LuMelon avatar Aug 11 '19 02:08 LuMelon

Hi @LuMelon, if you are looking for English language you can find many good resources to use is as a training dataset. You may check:

And many other datasets. If you are targeting closed domain application or different languages, I would recommend to use

Where you can build your own dataset for similar words for more than 100 languages.

SupervisionT avatar Sep 04 '19 23:09 SupervisionT

If you could elaborate on what exact problem are you solving, maybe one can help you with the specific dataset you are looking for.

skt7 avatar Sep 10 '19 12:09 skt7

I have the same question. What corpus did you use for training the language model? How did you construct the pairs? Did you manually construct the pairs? Or did you use a context window similar to Word2Vec using Keras' skipgrams function?

phongtheha avatar Oct 15 '19 20:10 phongtheha

Do you have data generator from given corpus?

mustfkeskin avatar Nov 13 '19 13:11 mustfkeskin

In for the training corpus. Why not share it?

rjurney avatar Dec 13 '19 20:12 rjurney

thank you for the model , creators . I sincerely apologize if my doubt is wrong because im new to this domain . I have seen models whenever they are called uses the "predict" method and "fit" during training but this model doesn't why is that so ? If i have a scenario to match with a particular word how do i use it as an argument for prediction ? like i have "drawing number " as the word and i need to see the similarity between "drawing reference " how to do it ? thank you so much

jjustinm4 avatar Mar 19 '21 07:03 jjustinm4