RDPTools
RDPTools copied to clipboard
rdpclassifiertraindata download timeouts and breaks the build
hello,
while trying to build a docker image for RDPTools I have a problem with the donload of the classifier training set that timesout
see:
download-traindata:
[get] Getting: http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
[get] To: /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz
[untar] Expanding: /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz into /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes
BUILD FAILED
/local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build.xml:112: Error while expanding /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz
java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
wget of the same url gives:
wget http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
--2016-02-25 11:43:38-- http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
Resolving rdp.cme.msu.edu... 35.8.164.79
Connecting to rdp.cme.msu.edu|35.8.164.79|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 149530230 (143M) [application/x-gzip]
Saving to: 'data.tgz'
61% [================================> ] 91,435,408 255KB/s in 3m 30s
2016-02-25 11:47:08 (425 KB/s) - Connection closed at byte 91435408. Retrying.
seems to me that the the get method used does not honour timeout nor the retry
best regards
Eric
Hi, Eric,
We tried and were not able to replicate this problem using computers in locations. We will look into any adjustments that might remedy this situation. For now, would you mind downloading this file and add to the folder if this problem persists? Thank you.
Benli
On Thu, Feb 25, 2016 at 5:55 AM, Eric Deveaud [email protected] wrote:
hello,
while trying to build a docker image for RDPTools I have a problem with the donload of the classifier training set that timesout
see:
download-traindata: [get] Getting: http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz [get] To: /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz [untar] Expanding: /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz into /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes
BUILD FAILED /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build.xml:112: Error while expanding /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz java.io.EOFException: Unexpected end of ZLIB input stream at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
wget of the same url gives:
wget http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz --2016-02-25 http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz--2016-02-25 11:43:38-- http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz Resolving rdp.cme.msu.edu... 35.8.164.79 Connecting to rdp.cme.msu.edu|35.8.164.79|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 149530230 (143M) [application/x-gzip] Saving to: 'data.tgz'
61% [================================> ] 91,435,408 255KB/s in 3m 30s
2016-02-25 11:47:08 (425 KB/s) - Connection closed at byte 91435408. Retrying.
seems to me that the the get method used does not honour timeout nor the retry
best regards
Eric
— Reply to this email directly or view it on GitHub https://github.com/rdpstaff/RDPTools/issues/10.
RDP Staff Ribosomal Database Project Center for Microbial Ecology Michigan State University 567 Wilson Rd. Room 2225 A East Lansing, MI 48824 (517) 353-3842
currently I was abble to build using the following
get externaly the data.tgz (wget) host data.tgz in localhost web server and patch classifier/build.xml to use localhost instead of rdp.cme.msu.edu
sed -i -e 's,http://rdp.cme.msu.edu/download,http://localhost|' classifier/build.xml
it's more or less what you suggested.
2 suggestion to fix the build process
- (hard way) check the get method used while building in order to see if it can handle timeouts
- (easy way) what you suggested. remove training data download from the build process and document that user must download the files by their own.
regards
Eric
Hi, Eric,
Thank you for the suggestions. We will look into the options to get it fixed.
Benli
On Sat, Feb 27, 2016 at 6:22 AM, Eric Deveaud [email protected] wrote:
currently I was abble to build using the following
get externaly the data.tgz (wget) host data.tgz in localhost web server and patch classifier/build.xml to use localhost instead of rdp.cme.msu.edu
sed -i -e 's,http://rdp.cme.msu.edu/download,http://localhost|' classifier/build.xml
it's more or less what you suggested.
2 suggestion to fix the build process
- (hard way) check the get method used while building in order to see if it can handle timeouts
- (easy way) what you suggested. remove training data download from the build process and document that user must download the files by their own.
regards
Eric
— Reply to this email directly or view it on GitHub https://github.com/rdpstaff/RDPTools/issues/10#issuecomment-189619856.
RDP Staff Ribosomal Database Project Center for Microbial Ecology Michigan State University 567 Wilson Rd. Room 2225 A East Lansing, MI 48824 (517) 353-3842
back at this.
I had to make a fresh install RDPtools. here is some output from wget
make[1]: Entering directory `/inst/RDPTools/RDPTools-2.0.2'
# java builder//installer tries to download data file and timeout
# donwload externaly
test -d /src/RDPTools/RDPTools-2.0.2/classifier/build/classes || mkdir -m 2775 -p /src/RDPTools/RDPTools-2.0.2/classifier/build/classes
test -f /src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz || \
wget --tries=5 -c http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz -O /src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz
--2016-07-07 18:02:02-- http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
Resolving rdp.cme.msu.edu... 35.8.164.79
Connecting to rdp.cme.msu.edu|35.8.164.79|:80... connected.
HTTP request sent, awaiting response... No data received.
Retrying.
--2016-07-07 18:03:33-- (try: 2) http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
Connecting to rdp.cme.msu.edu|35.8.164.79|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 181332714 (173M) [application/x-gzip]
Saving to: `/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz'
0% [ ] 302,632 101K/s in 47s
2016-07-07 18:04:22 (6.26 KB/s) - Connection closed at byte 302632. Retrying.
--2016-07-07 18:04:24-- (try: 3) http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
Connecting to rdp.cme.msu.edu|35.8.164.79|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 181332714 (173M), 181030082 (173M) remaining [application/x-gzip]
Saving to: `/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz'
100%[=======================================================================================>] 181,332,714 5.31M/s in 36s
2016-07-07 18:05:00 (4.81 MB/s) - `/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz' saved [181332714/181332714]
Cannot download the traindata either. Can you copy the traindata somewhere or fix the URL ??
Same problem here as of 14/05/2024