scirpy
scirpy copied to clipboard
airr validation fails when converting demo dataset to_airr_cells
Description of the bug
Can't convert demo dataset to_airr_cells
because of AIRR validation error.
Minimal reproducible example
import scirpy as ir
adata = ir.datasets.wu2020_3k()
ir.io.to_airr_cells(adata)
The error message produced by the code above
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File ~/anaconda3/envs/test_awkward/lib/python3.9/site-packages/airr/schema.py:193, in Schema.to_int(self, value, validate)
192 try:
--> 193 return int(value)
194 except ValueError:
ValueError: invalid literal for int() with base 10: '2.0'
During handling of the above exception, another exception occurred:
ValidationError Traceback (most recent call last)
File ~/anaconda3/envs/test_awkward/lib/python3.9/site-packages/airr/schema.py:275, in Schema.validate_row(self, row)
274 if spec == 'boolean': self.to_bool(row[f], validate=True)
--> 275 if spec == 'integer': self.to_int(row[f], validate=True)
276 if spec == 'number': self.to_float(row[f], validate=True)
File ~/anaconda3/envs/test_awkward/lib/python3.9/site-packages/airr/schema.py:196, in Schema.to_int(self, value, validate)
195 if validate:
--> 196 raise ValidationError('invalid int %s'% value)
197 else:
ValidationError: invalid int 2.0
During handling of the above exception, another exception occurred:
ValidationError Traceback (most recent call last)
Input In [76], in <cell line: 1>()
----> 1 airr_cells = ir.io.to_airr_cells(adata)
File ~/anaconda3/envs/test_awkward/lib/python3.9/site-packages/scirpy/io/_util.py:67, in _check_upgrade_schema.<locals>.check_upgrade_schema_decorator.<locals>.check_wrapper(*args, **kwargs)
65 for i in check_args:
66 _check_anndata_upgrade_schema(args[i])
---> 67 return f(*args, **kwargs)
File ~/anaconda3/envs/test_awkward/lib/python3.9/site-packages/scirpy/io/_convert_anndata.py:133, in to_airr_cells(adata)
130 for tmp_chain in chains.values():
131 # Don't add empty chains!
132 if not all([_is_na2(x) for x in tmp_chain.values()]):
--> 133 tmp_ir_cell.add_chain(tmp_chain)
135 try:
136 tmp_ir_cell.add_serialized_chains(row["extra_chains"])
File ~/anaconda3/envs/test_awkward/lib/python3.9/site-packages/scirpy/io/_datastructures.py:134, in AirrCell.add_chain(self, chain)
131 # TODO this should be `.validate_obj` but currently does not work
132 # because of https://github.com/airr-community/airr-standards/issues/508
133 RearrangementSchema.validate_header(chain.keys())
--> 134 RearrangementSchema.validate_row(chain)
136 for tmp_field in self._cell_attribute_fields:
137 # It is ok if a field specified as cell attribute is not present in the chain
138 try:
File ~/anaconda3/envs/test_awkward/lib/python3.9/site-packages/airr/schema.py:278, in Schema.validate_row(self, row)
276 if spec == 'number': self.to_float(row[f], validate=True)
277 except ValidationError as e:
--> 278 raise ValidationError('field %s has %s' %(f, e))
280 return True
ValidationError: field duplicate_count has invalid int 2.0
Version information
-----
anndata 0.8.0rc2.dev27+ge524389
scanpy 1.9.1
-----
Levenshtein NA
PIL 9.1.1
adjustText NA
airr 1.3.1
asttokens NA
awkward 1.8.0
backcall 0.2.0
beta_ufunc NA
binom_ufunc NA
cycler 0.10.0
cython_runtime NA
dateutil 2.8.2
debugpy 1.6.0
decorator 5.1.1
entrypoints 0.4
executing 0.8.3
h5py 3.7.0
hypergeom_ufunc NA
igraph 0.9.11
ipykernel 6.15.0
jedi 0.18.1
joblib 1.1.0
kiwisolver 1.4.3
llvmlite 0.38.1
matplotlib 3.5.2
mpl_toolkits NA
natsort 8.1.0
nbinom_ufunc NA
networkx 2.8.4
numba 0.55.2
numpy 1.22.4
packaging 21.3
pandas 1.4.2
parasail 1.2.4
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
pkg_resources NA
prompt_toolkit 3.0.29
psutil 5.9.1
ptyprocess 0.7.0
pure_eval 0.2.2
pydev_ipython NA
pydevconsole NA
pydevd 2.8.0
pydevd_file_utils NA
pydevd_plugins NA
pydevd_tracing NA
pygments 2.12.0
pyparsing 3.0.9
pytoml NA
pytz 2022.1
scipy 1.8.1
scirpy 0.10.1
seaborn 0.11.2
session_info 1.0.0
setuptools 62.5.0
setuptools_scm NA
six 1.16.0
sklearn 1.1.1
stack_data 0.3.0
statsmodels 0.13.2
tabulate 0.8.9
texttable 1.6.4
threadpoolctl 3.1.0
tornado 6.1
tqdm 4.64.0
tracerlib NA
traitlets 5.3.0
wcwidth 0.2.5
yaml 6.0
yamlordereddictloader NA
zmq 23.1.0
-----
IPython 8.4.0
jupyter_client 7.3.4
jupyter_core 4.10.0
-----
Python 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:58:50) [GCC 10.3.0]
Linux-5.18.2-arch1-1-x86_64-with-glibc2.35
-----
Session information updated at 2022-06-20 19:01
The problem is that IR_VDJ_2_duplicate_count
is of type str
as it contains "None"
.
solved with the new data structure and the new example dataset