pysradb icon indicating copy to clipboard operation
pysradb copied to clipboard

Data download is interrupted after a few minutes

Open sert23 opened this issue 1 year ago • 7 comments

Describe the bug Not sure what's happening but for the last few days, I'm struggling to download data using pysradb. This used to work no problem a couple of weeks ago. Here is the error I get:

File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/urllib3/response.py", line 444, in _error_catcher [6/370] yield
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/urllib3/response.py", line 567, in read
data = self._fp_read(amt) if not fp_closed else b""
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/urllib3/response.py", line 533, in _fp_read
return self._fp.read(amt) if amt is not None else self._fp.read()
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/http/client.py", line 460, in read
return self._read_chunked(amt) File "/home/eap/anaconda/envs/pysradb/lib/python3.10/http/client.py", line 583, in _read_chunked chunk_left = self._get_chunk_left() File "/home/eap/anaconda/envs/pysradb/lib/python3.10/http/client.py", line 566, in _get_chunk_left chunk_left = self._read_next_chunk_size() File "/home/eap/anaconda/envs/pysradb/lib/python3.10/http/client.py", line 526, in _read_next_chunk_size line = self.fp.readline(_MAXLINE + 1) File "/home/eap/anaconda/envs/pysradb/lib/python3.10/socket.py", line 705, in readinto return self._sock.recv_into(b) File "/home/eap/anaconda/envs/pysradb/lib/python3.10/ssl.py", line 1274, in recv_into return self.read(nbytes, buffer) File "/home/eap/anaconda/envs/pysradb/lib/python3.10/ssl.py", line 1130, in read return self._sslobj.read(len, buffer) TimeoutError: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/eap/miRexpress/updates/code/run_update.py", line 200, in generate_raw_tsv("miRNA-seq", os.path.join(raw_folder, "miRNA-seq.tsv")) File "/home/eap/miRexpress/updates/code/run_update.py", line 36, in generate_raw_tsv instance.search() File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/pysradb/search.py", line 793, in search self._format_response(r.raw) File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/pysradb/search.py", line 861, in _format_response for event, elem in Et.iterparse(content): File "/home/eap/anaconda/envs/pysradb/lib/python3.10/xml/etree/ElementTree.py", line 1255, in iterator data = source.read(16 * 1024) File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/urllib3/response.py", line 566, in read with self._error_catcher(): File "/home/eap/anaconda/envs/pysradb/lib/python3.10/contextlib.py", line 153, in exit self.gen.throw(typ, value, traceback) File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/urllib3/response.py", line 449, in _error_catcher raise ReadTimeoutError(self._pool, None, "Read timed out.")

It seems like it's getting disconnected after some minutes. Is there a parameter I can change to make it retry or something similar? Are they blocking my IP? Is this a widespread recent issue?

To Reproduce This really happen with any attempt now (randomly) after a few minutes. In this example I'm trying to download info about all miRNA-seq samples in SRA:

instance = SraSearch(2, 1000000 strategy="miRNA-seq") print("Downloading samples for " + library_type) instance.search()

Thanks a lot for writing this software and the support!!

sert23 avatar Jun 19 '23 11:06 sert23