castor other dataset trained on vdpwi

other dataset trained on vdpwi

Open xyx-x opened this issue 6 years ago • 4 comments

I saw the dataset loading fixed to four dataset(sick,msrvid, trecqa, wikiqa). I wanted to know how to trained vdpwi with other datasets. what's more, how to reasonably organize the dataset. I try to copied my dataset into the file 'sick' and my embedding into the file 'GloVe', but the model trained with 0 loss. Can you give me the correct instruction?

Aug 07 '18 09:08 xyx-x

Hi, please refer to the instructions in #134. That issue is for another model, MP-CNN, but it should work the same way. If you have more questions let us know!

Aug 07 '18 23:08 tuzhucheng

I stored the dataset like a.toks and b.toks because i used the vdpwi with the lua version before. When i used this pytorch version, i can running the code. However, when i trained the model, the loss i got is 0. That's where the problem is.

Aug 08 '18 02:08 xyx-x

I found the key of the problem. My sim.txt contains only two values which are 0 and 5 (The value of the first 5000 lines of my sim.txt is 5, and the value of the following lines is 0.)，and the training loss is 0. When i changed the sim.txt which contains two values which are 0 and 4.5, the loss is not 0. How can i train correctly with my sim.txt? Can you give me some instructions?

Aug 08 '18 07:08 xyx-x

Seems that your label is binary. How about processing your data by converting "5" to "1" so you only have 0s and 1s. Then you can take a look at https://github.com/castorini/Castor/blob/master/datasets/trecqa.py and set NUM_CLASSES to 2.

Aug 10 '18 19:08 tuzhucheng

castor castor copied to clipboard

other dataset trained on vdpwi

castor
castor copied to clipboard