pykt-toolkit icon indicating copy to clipboard operation
pykt-toolkit copied to clipboard

Problem about the number of questions of Algebra2005 dataset

Open xiangxin-oss opened this issue 1 year ago • 1 comments

Hello, I known that the number of questions of Algebra2005 dataset was 210,710 from the website https://pykt-toolkit.readthedocs.io/en/latest/datasets.html#algebra2005, but I saw that the "num_q" of Algebra2005 dataset was 173113 from the "data_config.json" file in the "configs" folder. Don't these two indicators mean the same thing? If the index of "num_q" was true, How to calculate the "num_q" of Algebra2005 dataset?

xiangxin-oss avatar Dec 29 '23 10:12 xiangxin-oss

Thank you for your question! The number 210,710 represents the original number of questions in the Algebra2005 dataset. However, after applying the standard data processing pipeline of pykt, only 173,113 questions remain as the valid ones, which is why the “num_q” value in the data_config.json file is different.

I hope this clears up the confusion! Let me know if you have any more questions.

Li-XYi avatar Oct 03 '24 00:10 Li-XYi