souporcell Exception: int variable contained non-int values

Greetings! I was wondering if you might be able to help resolve an issue we are encountering during the consensus.py step which is generating the following output/error:

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_c58d6755a445ee1723e096eb7e36ea75 NOW. 14884452 excluded for potential RNA editing 25990 doublets excluded from genotype and ambient RNA estimation 0 not used for soup calculation due to possible RNA edit Traceback (most recent call last): File "/opt/souporcell/consensus.py", line 348, in fit = sm.optimizing(data=counts_dat) File "/usr/local/lib/python3.8/site-packages/pystan/model.py", line 542, in optimizing fit = self.fit_class(data, seed) File "pystan_yvxd5ae2/stanfit4anon_model_c58d6755a445ee1723e096eb7e36ea75_7285297220659911018.pyx", line 479, in stanfit4anon_model_c58d6755a445ee1723e096eb7e36ea75_7285297220659911018.StanFit4Model.cinit RuntimeError: Exception: int variable contained non-int values; processing stage=data initialization; variable name=cluster_allele_counts_soup; base type=int (in 'unknown file name' at line 9)

Our current workflow calls the compile_stan_model.py and consensus.py steps of the pipeline with the following commands: python3.8 /opt/souporcell/compile_stan_model.py && python3.8 /opt/souporcell/consensus.py -a out_matrix.mtx -c clusters.tsv -r ref_matrix.mtx -v 1000G_acan_hg38_snps_mainchr.vcf --soup_out ambient_rna.txt --vcf_out cluster_genotypes.vcf --output_dir .

This seems to work for the majority of our samples, but there appears to be an edge case that throws this error in a couple of them. Any help you can provide to assist us in determining the cause of this would be highly appreciated.

Mar 27 '21 01:03 bensesbg

Did you ever find out what causes this? I'd be very interested to know

Jun 10 '21 10:06 TessaGillett

I am seeing a very similar error right now:

29689 doublets excluded from genotype and ambient RNA estimation
0 not used for soup calculation due to possible RNA edit

Traceback (most recent call last):
  File "/opt/souporcell/consensus.py", line 348, in <module>
    fit = sm.optimizing(data=counts_dat)
  File "/opt/conda/lib/python3.6/site-packages/pystan/model.py", line 542, in optimizing
    fit = self.fit_class(data, seed)
  File "stanfit4anon_model_c58d6755a445ee1723e096eb7e36ea75_355834653533342947.pyx", line 459, in stanfit4anon_model_c58d6755a445ee1723e096eb7e36ea75_355834653533342947.StanFit4Model.__cinit__
RuntimeError: Exception: int variable contained non-int values; processing stage=data initialization; variable name=cluster_allele_counts; base type=int  (in 'unknown file name' at line 8)

Do we need to tell PyStan that the cluster_allele_counts variable contains integers?

Oct 06 '21 13:10 slowkow

Pystan version changes things and pystan version is also sensitive to python version. If you use my conda environment i think this should go away.

Oct 06 '21 16:10 wheaton5

I see exactly the same issue, running souporcell in the singularity container that I downloaded a couple weeks ago. Several samples have worked fine, now this error:

169910 excluded for potential RNA editing
5971 doublets excluded from genotype and ambient RNA estimation
0 not used for soup calculation due to possible RNA edit
Traceback (most recent call last):
  File "/opt/souporcell/consensus.py", line 348, in <module>
    fit = sm.optimizing(data=counts_dat)
  File "/usr/local/envs/py36/lib/python3.6/site-packages/pystan/model.py", line 472, in optimizing
    fit = self.fit_class(data, seed)
  File "stanfit4anon_model_c58d6755a445ee1723e096eb7e36ea75_355834653533342947.pyx", line 459, in stanfit4anon_model_c58d6755a445ee1723e096eb7e36ea75_355834653533342947.StanFit4Model.__cinit__
RuntimeError: Exception: int variable contained non-int values; processing stage=data initialization; variable name=cluster_allele_counts_soup; base type=int  (in 'unknown file name' at line 9)

Feb 24 '23 08:02 pl-ki

I see exactly the same issue, running souporcell in the singularity container that I downloaded a couple weeks ago. Several samples have worked fine, now this error:

169910 excluded for potential RNA editing
5971 doublets excluded from genotype and ambient RNA estimation
0 not used for soup calculation due to possible RNA edit
Traceback (most recent call last):
  File "/opt/souporcell/consensus.py", line 348, in <module>
    fit = sm.optimizing(data=counts_dat)
  File "/usr/local/envs/py36/lib/python3.6/site-packages/pystan/model.py", line 472, in optimizing
    fit = self.fit_class(data, seed)
  File "stanfit4anon_model_c58d6755a445ee1723e096eb7e36ea75_355834653533342947.pyx", line 459, in stanfit4anon_model_c58d6755a445ee1723e096eb7e36ea75_355834653533342947.StanFit4Model.__cinit__
RuntimeError: Exception: int variable contained non-int values; processing stage=data initialization; variable name=cluster_allele_counts_soup; base type=int  (in 'unknown file name' at line 9)

I notice in the 'clusters.tsv' file for this sample that basically all cells are 'unassigned':

Count   Assignment
      3 doublet	0/1
      1 singlet	0
      1 status	assignment
   5739 unassigned	0
    104 unassigned	0/1
     86 unassigned	1
     39 unassigned	1/0

Can the absence of a singlet with class=1 be the cause of the error?

Feb 24 '23 08:02 pl-ki

souporcell souporcell copied to clipboard

Exception: int variable contained non-int values

souporcell
souporcell copied to clipboard