castor
castor copied to clipboard
other dataset trained on vdpwi
I saw the dataset loading fixed to four dataset(sick,msrvid, trecqa, wikiqa). I wanted to know how to trained vdpwi with other datasets. what's more, how to reasonably organize the dataset. I try to copied my dataset into the file 'sick' and my embedding into the file 'GloVe', but the model trained with 0 loss. Can you give me the correct instruction?
Hi, please refer to the instructions in #134. That issue is for another model, MP-CNN, but it should work the same way. If you have more questions let us know!
I stored the dataset like a.toks and b.toks because i used the vdpwi with the lua version before. When i used this pytorch version, i can running the code. However, when i trained the model, the loss i got is 0. That's where the problem is.
I found the key of the problem. My sim.txt contains only two values which are 0 and 5 (The value of the first 5000 lines of my sim.txt is 5, and the value of the following lines is 0.),and the training loss is 0. When i changed the sim.txt which contains two values which are 0 and 4.5, the loss is not 0. How can i train correctly with my sim.txt? Can you give me some instructions?
Seems that your label is binary. How about processing your data by converting "5" to "1" so you only have 0s and 1s. Then you can take a look at https://github.com/castorini/Castor/blob/master/datasets/trecqa.py and set NUM_CLASSES
to 2.