citation-function icon indicating copy to clipboard operation
citation-function copied to clipboard

Missing Topics Directory

Open jaidevshriram opened this issue 3 years ago • 2 comments

Hey!

Thanks for making this project public. I'm trying to run the pipeline mentioned in code/run-pipeline.sh but ran into some errors when executing run_topic_cv.py. The files required in lines 77-78 are unavailable in the repo or on the project webpage:

../working-dir/topics/citance.doc-topics.txt

Could you share the topics folder if available?

Further, if you have the trained model file on hand and an evaluation script, that would be of great help - I'm currently trying to evaluate the performance of this model.

jaidevshriram avatar Oct 23 '21 13:10 jaidevshriram

Hi Jaidev,

The run-pipeline.sh will generate those files for you if you uncomment this line:

#python generate_mallet_corpus.py ../working-files/

The pipeline script has a bunch of "big" steps commented out because they're only run once and take a while to do. If you're starting from scratch to replicate everything, you'll want to uncomment the relevant commands, which are indicated in the script's comment notes.

The paper used leave-one-paper-out cross validation, so I don't have a script handy at the moment but it should be replicable. I think we later tested it on an 80/20 split and the performance was statistically equivalent.

davidjurgens avatar Oct 23 '21 15:10 davidjurgens

Thanks! Will try that out.

jaidevshriram avatar Oct 23 '21 16:10 jaidevshriram