SentenceRepresentation icon indicating copy to clipboard operation
SentenceRepresentation copied to clipboard

the corpus unavaliable

Open redreamality opened this issue 7 years ago • 4 comments

http://www.cs.toronto.edu/~mbweb/ seems down at the moment, any mirrors?

redreamality avatar Oct 19 '16 13:10 redreamality

I managed to get a hold of the dataset after mailing the authors of the paper, and I got two files- books_large_p1.txt and books_large_p2.txt. The code however refers to a books_large_70m.txt. Is that just the result of concatenating the two files? I'm trying to reproduce the results of the paper...

agent-jay avatar Feb 12 '17 16:02 agent-jay

Aha yes, glad that you managed to find the corpus.

To get the 70m file I concatenated them both and took the first 70m lines of that concatenated file. I was intending to save the rest for some new evaluations but in the end I never did it. Let me know if you need further clarification!

On 12 February 2017 at 16:22, agentJay [email protected] wrote:

I requested the dataset from the , and I get two files- books_large_p1.txt and books_large_p2.txt. The code however refers to a books_large_70m.txt. Is that just the result of concatenating the two files. I'm trying to reproduce the results of the paper...

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fh295/SentenceRepresentation/issues/3#issuecomment-279229276, or mute the thread https://github.com/notifications/unsubscribe-auth/AH6L9jhQLygBSMFe9Rk1YksBx2TKiIK5ks5rbzGtgaJpZM4Ka_CF .

--

Felix Hill University of Cambridge [email protected]

http://www.cl.cam.ac.uk/~fh295/

fh295 avatar Feb 12 '17 19:02 fh295

Gotcha. Thanks!

agent-jay avatar Feb 12 '17 22:02 agent-jay

Will you please share me the dataset please? thank you. [email protected]

1024er avatar Nov 29 '18 00:11 1024er