DialoGPT icon indicating copy to clipboard operation
DialoGPT copied to clipboard

Problem for downloading data of reddit

Open SparkJiao opened this issue 6 years ago • 4 comments

Hi, great thanks to your contribution!

I try to use python demo.py --data full to download the reddit data. For I don't want to train the model now I didn't use the docker. I find that the link to the data is here: https://convaisharables.blob.core.windows.net/lsp/keys-full.tar It seems that I can't open that even with proxy. So do you have any other link to the reddit data?

Sorry to bother you. Thank you very much !

SparkJiao avatar Nov 05 '19 02:11 SparkJiao

I just checked the link worked on my side, can you double check with it again?

intersun avatar Nov 05 '19 21:11 intersun

@intersun Hi, thanks for your reply. Indeed the link in normal and I could download the keys-full.tar. But I have encountered other problems.

  1. I think the path for saving keys-full.tar is wrong. In the makefile, it's saved under ./reddit_extractor/, but the make command wants to find it under ./reddit_extractor/data/.
  2. I move the keys-full.tar to the directory ./reddit_extractor/data/ and comment the wget command and then re-run the demo.py and I got following error report. Is this because the keys-full.tar file are damaged during downloading or other reasons?
11/05/2019 22:20:46 - INFO - __main__ -   Downloading and Extracting Data...
make: *** [data/reddit/RC_2011-02.bz2] Error 4
make: *** Waiting for unfinished jobs....
11/06/2019 01:46:10 - INFO - __main__ -   Preparing Data...
prepro.py --corpus ./data/train.tsv --max_seq_len 128
11/06/2019 01:48:21 - INFO - __main__ -   Done!

11/06/2019 01:48:21 - INFO - __main__ -   Generating training CMD!

Besides, the file .data/train.tsv doesn't exist.

Thanks for your help very much!

SparkJiao avatar Nov 06 '19 01:11 SparkJiao

I had a similar problem, but appears to make progress after re-clone of the repository. I think the process does not like doing "--data full" after doing "--data small".

kinoc avatar Dec 24 '19 09:12 kinoc

@intersun Hi, thanks for your reply. Indeed the link in normal and I could download the keys-full.tar. But I have encountered other problems.

  1. I think the path for saving keys-full.tar is wrong. In the makefile, it's saved under ./reddit_extractor/, but the make command wants to find it under ./reddit_extractor/data/.
  2. I move the keys-full.tar to the directory ./reddit_extractor/data/ and comment the wget command and then re-run the demo.py and I got following error report. Is this because the keys-full.tar file are damaged during downloading or other reasons?
11/05/2019 22:20:46 - INFO - __main__ -   Downloading and Extracting Data...
make: *** [data/reddit/RC_2011-02.bz2] Error 4
make: *** Waiting for unfinished jobs....
11/06/2019 01:46:10 - INFO - __main__ -   Preparing Data...
prepro.py --corpus ./data/train.tsv --max_seq_len 128
11/06/2019 01:48:21 - INFO - __main__ -   Done!

11/06/2019 01:48:21 - INFO - __main__ -   Generating training CMD!

Besides, the file .data/train.tsv doesn't exist.

Thanks for your help very much!

I have the same problem here (Error of RC_2011-02.bz2), although I am using the latest repository. Did you solve this problem?

createmomo avatar Feb 03 '21 17:02 createmomo