SentenceRepresentation
SentenceRepresentation copied to clipboard
the corpus unavaliable
http://www.cs.toronto.edu/~mbweb/ seems down at the moment, any mirrors?
I managed to get a hold of the dataset after mailing the authors of the paper, and I got two files- books_large_p1.txt and books_large_p2.txt. The code however refers to a books_large_70m.txt. Is that just the result of concatenating the two files? I'm trying to reproduce the results of the paper...
Aha yes, glad that you managed to find the corpus.
To get the 70m file I concatenated them both and took the first 70m lines of that concatenated file. I was intending to save the rest for some new evaluations but in the end I never did it. Let me know if you need further clarification!
On 12 February 2017 at 16:22, agentJay [email protected] wrote:
I requested the dataset from the , and I get two files- books_large_p1.txt and books_large_p2.txt. The code however refers to a books_large_70m.txt. Is that just the result of concatenating the two files. I'm trying to reproduce the results of the paper...
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fh295/SentenceRepresentation/issues/3#issuecomment-279229276, or mute the thread https://github.com/notifications/unsubscribe-auth/AH6L9jhQLygBSMFe9Rk1YksBx2TKiIK5ks5rbzGtgaJpZM4Ka_CF .
--
Felix Hill University of Cambridge [email protected]
http://www.cl.cam.ac.uk/~fh295/
Gotcha. Thanks!
Will you please share me the dataset please? thank you. [email protected]