Vertical
Vertical copied to clipboard
Train-test split in UNIV scene
Hi @cocoon2wong ,
Thank you very much for making your excellent work and code available!
While testing your code, I discovered that the train-test split for the UNIV scene is different from that used in other studies. Typically, the set ['students001', 'students003'] is used for testing in the UNIV scene, with the remaining scenes as the training set.
https://github.com/zhangpur/SR-LSTM/blob/0d3a0136e302f0b6f607251a2f40277d1cd70b40/utils.py#L37-L38
self.data_dirs = ['./data/eth/univ', './data/eth/hotel',
'./data/ucy/zara/zara01', './data/ucy/zara/zara02',
'./data/ucy/univ/students001','data/ucy/univ/students003',
'./data/ucy/univ/uni_examples','./data/ucy/zara/zara03']
...
if args.test_set==4 or args.test_set==5:
self.test_set=[4,5]
In your implementation, however, it seems that only ['students001'] is assigned to the test set, and ['students003'] is used in training.
https://github.com/cocoon2wong/Vertical/blob/178866cf547150dc98d18817713e257aff7429f9/datasets/univ.plist#L5-L22
Given the complexity and predictive challenges of the UNIV scene, excluding it from the training data might affect performance adversely. Could you share the results if the train-test split and dataset are aligned with that of other papers for an apple-to-apple comparison? Does this issue also pertain to SocialCircle, which employs the same dataloader?
I appreciate your attention to this matter and look forward to your insights. Thank you again for your contributions to the field!