pysradb
pysradb copied to clipboard
[BUG] All arrays must be of the same length
Describe the bug Unable to run pysradb for GSE198257 To Reproduce Steps to reproduce the behavior:
## Installation: pip install git+https://github.com/saketkc/pysradb
pysradb gse-to-srp GSE198257
Traceback (most recent call last):
File "/home/subudhak/miniconda3/bin/pysradb", line 8, in <module>
sys.exit(parse_args())
^^^^^^^^^^^^
File "/home/subudhak/miniconda3/lib/python3.11/site-packages/pysradb/cli.py", line 1206, in parse_args
gse_to_srp(args.gse_ids, args.saveto, args.detailed, args.desc, args.expand)
File "/home/subudhak/miniconda3/lib/python3.11/site-packages/pysradb/cli.py", line 232, in gse_to_srp
df = sradb.gse_to_srp(
^^^^^^^^^^^^^^^^^
File "/home/subudhak/miniconda3/lib/python3.11/site-packages/pysradb/sraweb.py", line 799, in gse_to_srp
new_gse_df = pd.DataFrame(
^^^^^^^^^^^^^
File "/home/subudhak/miniconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 767, in __init__
mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/subudhak/miniconda3/lib/python3.11/site-packages/pandas/core/internals/construction.py", line 503, in dict_to_mgr
return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/subudhak/miniconda3/lib/python3.11/site-packages/pandas/core/internals/construction.py", line 114, in arrays_to_mgr
index = _extract_index(arrays)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/subudhak/miniconda3/lib/python3.11/site-packages/pandas/core/internals/construction.py", line 677, in _extract_index
raise ValueError("All arrays must be of the same length")
ValueError: All arrays must be of the same length
Desktop (please complete the following information):
- OS: [ Ubuntu 20.04]
- Python version [Python 3.11.8]
I'm getting the same error for GSE279289. The sradb.gse_to_srp code assumes that all accessions return a dataframe, but some return None, which caused the error:
def fetch_gds_results(self, gse, **kwargs):
result = self.get_esummary_response("geo", gse)
try:
uids = result["uids"]
except KeyError:
print("No results found for {} | Obtained result: {}".format(gse, result))
return None
gse_records = []
for uid in uids:
record = result[uid]
del record["uid"]
if record["extrelations"]:
extrelations = record["extrelations"]
for extrelation in extrelations:
keys = list(extrelation.keys())
values = list(extrelation.values())
assert sorted(keys) == sorted(
["relationtype", "targetobject", "targetftplink"]
)
assert len(values) == 3
record[extrelation["relationtype"]] = extrelation["targetobject"]
del record["extrelations"]
gse_records.append(record)
if not len(gse_records):
print("No results found for {}".format(gse))
return None
return pd.DataFrame(gse_records)
The correct type hint for the return is -> Optional[pd.DataFrame].
However, the possible None return is not accounted for:
def gse_to_srp(self, gse, **kwargs):
if isinstance(gse, str):
gse = [gse]
gse_df = self.fetch_gds_results(gse, **kwargs)
gse_df = gse_df.rename(
columns={"accession": "study_alias", "SRA": "study_accession"}
)
This is now fixed:
pysradb gse-to-srp GSE198257
study_alias study_accession
GSE198257 SRP363227
GSE198257 SRP363224