Vertical icon indicating copy to clipboard operation
Vertical copied to clipboard

Train-test split in UNIV scene

Open InhwanBae opened this issue 7 months ago • 3 comments

Hi @cocoon2wong ,

Thank you very much for making your excellent work and code available!

While testing your code, I discovered that the train-test split for the UNIV scene is different from that used in other studies. Typically, the set ['students001', 'students003'] is used for testing in the UNIV scene, with the remaining scenes as the training set.

https://github.com/zhangpur/SR-LSTM/blob/0d3a0136e302f0b6f607251a2f40277d1cd70b40/utils.py#L37-L38

self.data_dirs = ['./data/eth/univ', './data/eth/hotel',
                  './data/ucy/zara/zara01', './data/ucy/zara/zara02',
                  './data/ucy/univ/students001','data/ucy/univ/students003',
                  './data/ucy/univ/uni_examples','./data/ucy/zara/zara03']
...
if args.test_set==4 or args.test_set==5:
    self.test_set=[4,5]

In your implementation, however, it seems that only ['students001'] is assigned to the test set, and ['students003'] is used in training.

https://github.com/cocoon2wong/Vertical/blob/178866cf547150dc98d18817713e257aff7429f9/datasets/univ.plist#L5-L22

Given the complexity and predictive challenges of the UNIV scene, excluding it from the training data might affect performance adversely. Could you share the results if the train-test split and dataset are aligned with that of other papers for an apple-to-apple comparison? Does this issue also pertain to SocialCircle, which employs the same dataloader?

I appreciate your attention to this matter and look forward to your insights. Thank you again for your contributions to the field!

InhwanBae avatar Jul 23 '24 12:07 InhwanBae