souporcell
souporcell copied to clipboard
Exception: int variable contained non-int values
Greetings! I was wondering if you might be able to help resolve an issue we are encountering during the consensus.py step which is generating the following output/error:
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_c58d6755a445ee1723e096eb7e36ea75 NOW.
14884452 excluded for potential RNA editing
25990 doublets excluded from genotype and ambient RNA estimation
0 not used for soup calculation due to possible RNA edit
Traceback (most recent call last):
File "/opt/souporcell/consensus.py", line 348, in
Our current workflow calls the compile_stan_model.py and consensus.py steps of the pipeline with the following commands: python3.8 /opt/souporcell/compile_stan_model.py && python3.8 /opt/souporcell/consensus.py -a out_matrix.mtx -c clusters.tsv -r ref_matrix.mtx -v 1000G_acan_hg38_snps_mainchr.vcf --soup_out ambient_rna.txt --vcf_out cluster_genotypes.vcf --output_dir .
This seems to work for the majority of our samples, but there appears to be an edge case that throws this error in a couple of them. Any help you can provide to assist us in determining the cause of this would be highly appreciated.
Did you ever find out what causes this? I'd be very interested to know
I am seeing a very similar error right now:
29689 doublets excluded from genotype and ambient RNA estimation
0 not used for soup calculation due to possible RNA edit
Traceback (most recent call last):
File "/opt/souporcell/consensus.py", line 348, in <module>
fit = sm.optimizing(data=counts_dat)
File "/opt/conda/lib/python3.6/site-packages/pystan/model.py", line 542, in optimizing
fit = self.fit_class(data, seed)
File "stanfit4anon_model_c58d6755a445ee1723e096eb7e36ea75_355834653533342947.pyx", line 459, in stanfit4anon_model_c58d6755a445ee1723e096eb7e36ea75_355834653533342947.StanFit4Model.__cinit__
RuntimeError: Exception: int variable contained non-int values; processing stage=data initialization; variable name=cluster_allele_counts; base type=int (in 'unknown file name' at line 8)
Do we need to tell PyStan that the cluster_allele_counts
variable contains integers?
Pystan version changes things and pystan version is also sensitive to python version. If you use my conda environment i think this should go away.
I see exactly the same issue, running souporcell in the singularity container that I downloaded a couple weeks ago. Several samples have worked fine, now this error:
169910 excluded for potential RNA editing
5971 doublets excluded from genotype and ambient RNA estimation
0 not used for soup calculation due to possible RNA edit
Traceback (most recent call last):
File "/opt/souporcell/consensus.py", line 348, in <module>
fit = sm.optimizing(data=counts_dat)
File "/usr/local/envs/py36/lib/python3.6/site-packages/pystan/model.py", line 472, in optimizing
fit = self.fit_class(data, seed)
File "stanfit4anon_model_c58d6755a445ee1723e096eb7e36ea75_355834653533342947.pyx", line 459, in stanfit4anon_model_c58d6755a445ee1723e096eb7e36ea75_355834653533342947.StanFit4Model.__cinit__
RuntimeError: Exception: int variable contained non-int values; processing stage=data initialization; variable name=cluster_allele_counts_soup; base type=int (in 'unknown file name' at line 9)
I see exactly the same issue, running souporcell in the singularity container that I downloaded a couple weeks ago. Several samples have worked fine, now this error:
169910 excluded for potential RNA editing
5971 doublets excluded from genotype and ambient RNA estimation
0 not used for soup calculation due to possible RNA edit
Traceback (most recent call last):
File "/opt/souporcell/consensus.py", line 348, in <module>
fit = sm.optimizing(data=counts_dat)
File "/usr/local/envs/py36/lib/python3.6/site-packages/pystan/model.py", line 472, in optimizing
fit = self.fit_class(data, seed)
File "stanfit4anon_model_c58d6755a445ee1723e096eb7e36ea75_355834653533342947.pyx", line 459, in stanfit4anon_model_c58d6755a445ee1723e096eb7e36ea75_355834653533342947.StanFit4Model.__cinit__
RuntimeError: Exception: int variable contained non-int values; processing stage=data initialization; variable name=cluster_allele_counts_soup; base type=int (in 'unknown file name' at line 9)
I notice in the 'clusters.tsv' file for this sample that basically all cells are 'unassigned':
Count Assignment
3 doublet 0/1
1 singlet 0
1 status assignment
5739 unassigned 0
104 unassigned 0/1
86 unassigned 1
39 unassigned 1/0
Can the absence of a singlet with class=1 be the cause of the error?