NeuralCitationNetwork
                                
                                
                                
                                    NeuralCitationNetwork copied to clipboard
                            
                            
                            
                        Dataset configuration criteria
I downloaded it from "https://psu.app.box.com/v/refseer" which you mentioned, and the number of data is 110 million pieces. you have used 4,549,267(training : 4,258,383(~~2012) validation : 141,957(2013) test : 148,927(2014~) pieces and can you tell me configuration criteria? For example, not title, year.
Thanks.
Best Regard.
I inserted all the documents into mongodb then performed some preprocessing. Here are my notes/commands I used to prepare it.
Thank you for excellent answer. i have one more question. train, valid, test split of year is citing paper year? or cited paper year?
Thanks. Best Regard.
I split the data by the citing paper year.
Thanks for you answer.
but 'https://psu.app.box.com/v/refseer' it has 112903 citations(2012<year) validation, test set are not match you speak number of citations
SELECT count(*) FROM kdd2019.citations where kdd2019.citations.year>2012;
Thanks
What do you mean? After preprocessing?
No, just .sql file activate Are you problem this point?
2019년 3월 18일 (월) 오후 12:07, Travis [email protected]님이 작성:
What do you mean? After preprocessing?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tebesu/NeuralCitationNetwork/issues/4#issuecomment-473753788, or mute the thread https://github.com/notifications/unsubscribe-auth/AP1lDV5oG-J6KifAXsKyNlBbtm_M2va5ks5vXwMOgaJpZM4bbv3T .
I believe there are some problems with the sql file so I did some preprocessing then inserted into mongodb.
Take a look at https://github.com/harrywy/NPM
Hi @tebesu, I'm having trouble obtaining the same training/validation/test sets as described in the paper. Do you maybe have a list of citation context IDs from sql dumps that were used in experiments?
@zoranmedic
It should be in the dataset I provided.
@tebesu Right, I found it, thanks!