LightXML
LightXML copied to clipboard
Parameters for Amazon-670K
Thank you so much for your amazing project. Currently I've tried to reproduce the result on Amazon-670K dataset. However, I cannot find the specification for parameter top-K candidates, which is group_y_candidate_topk in your repo. Could you please kindly give me more information on that one?
Thank you so much in advance.
Thank you for your interest in our work.
You can find the parameter group_y_candidate_topk for Amazon-670k in https://github.com/kongds/LightXML/blob/b9af9443004d3bce8b9116edfe038b702d1b295c/run.sh#L35
Hi, Thank you so much for your response! I've tried with --group_y_candidate_topk in your config (also for all others parameters), however, the result seems quite different compared to the reported one (in both term of Precision and Running time). I wonder which parameters could be the cause for this phenomenon?
Thank you so much in advance.
The results reported in the paper are from an ensemble of three models. Could you provide the results you reproduced?
Moreover, the parameters should be correct, as other papers have successfully reproduced the results we reported :)
This is my configuration: python src/main.py --lr 1e-4 --epoch 15 --dataset amazon670k --swa --swa_warmup 4 --swa_step 3000 --batch 16 --max_len 128 --eval_step 3000 --group_y_candidate_num 2000 --group_y_candidate_topk 75 --valid --hidden_dim 400 --group_y_group 0
The result: *Precison p1=0.441, p3=0.398, p5=0.364 (19 epochs, batch-size=16)
Both training time and precision is not match with the reported one. I wonder if I've missed something? Thank you so much in advanced
It seems you reported the result on the validation set. To evaluate them on the test set, please run the evaluation.
python src/main.py --lr 1e-4 --epoch 15 --dataset amazon670k --swa --swa_warmup 4 --swa_step 3000 --batch 16 --max_len 128 --eval_step 3000 --group_y_candidate_num 2000 --group_y_candidate_topk 75 --valid --hidden_dim 400 --group_y_group 0 --eval_model
Oh i see, thank you so much for your response! Running the evaluation mode returns the precision in validation set with some files related to test set result: i.e amazon670k_t0-labels.npy and amazon670k_t0-scores.npy. How can i get the result from this file?
You can run the following command to get the results from a single model:
python src/ensemble_direct.py --model1 amazon670k_t0 --dataset amazon670k
The results in the paper are from a three-model ensemble, you can also reproduce it with our script by bash run.sh amazon670k.
Thanks a lot! And just a small question: About training time, running with 15 epochs for single model only costs for 15 hours, which is far less than reported in paper (28.75 hours). Could you help me with this one?
It may cause by different hardware, the reported training time in paper are from old version GPU (16GB V100).