TBProfiler icon indicating copy to clipboard operation
TBProfiler copied to clipboard

"Invalid literal for int() with base 10: '.'" Error

Open adriandrr opened this issue 1 year ago • 10 comments

Hey,

I am desperately looking for a command line tool predicting mycobacterium lineages to integrate into my own pipeline (master thesis related). I thought TBProfiler might be the one but I sacrificed many hours now to set it up and always ran into errors.

Error for tb-profiler profile -1 ~/mdr_mtb-master/data/10_S4_R1_001.fastq.gz -2 ~/mdr_mtb-master/data/10_S4_R2_001.fastq.gz --spoligotype

Running command: set -u pipefail; bcftools query -u -f '%CHROM\t%POS\t%REF\t%ALT\t%ANN\t[%AD]\n' ./7f06ebf9-79b2-44fd-ae9b-670f73ce64b9.targets.csq.vcf.gz Command Failed! Please Check! Exception ignored in: <generator object cmd_out at 0x7f0da4113b30> Traceback (most recent call last): File "/homes/adrian/.conda/envs/tb-profiler/lib/python3.9/site-packages/pathogenprofiler/utils.py", line 290, in cmd_out raise Exception Exception: Traceback (most recent call last): File "/homes/adrian/.conda/envs/tb-profiler/bin/tb-profiler", line 671, in args.func(args) File "/homes/adrian/.conda/envs/tb-profiler/bin/tb-profiler", line 151, in main_profile results.update(pp.run_profiler(args)) File "/homes/adrian/.conda/envs/tb-profiler/lib/python3.9/site-packages/pathogenprofiler/cli.py", line 37, in run_profiler results = bam_profiler( File "/homes/adrian/.conda/envs/tb-profiler/lib/python3.9/site-packages/pathogenprofiler/profiler.py", line 35, in bam_profiler ann = ann_vcf_obj.load_ann(bed_file=conf["bed"],keep_variant_types = ["upstream","synonymous","noncoding"],min_af=min_af) File "/homes/adrian/.conda/envs/tb-profiler/lib/python3.9/site-packages/pathogenprofiler/vcf.py", line 138, in load_ann ad = [int(x) for x in ad_str.split(",")] File "/homes/adrian/.conda/envs/tb-profiler/lib/python3.9/site-packages/pathogenprofiler/vcf.py", line 138, in ad = [int(x) for x in ad_str.split(",")] ValueError: invalid literal for int() with base 10: '.'

I work in an conda environment with all needed packages included, on a linux machine. My data is amplicon-based (focussing on the MTB resistance loci and spoligotyping loci), created with illumina technology at 151 read lengths.

Furthermore i tried the "tb-profiler lineage" command with bam files I created on my own, but the lineage info is always empty. Documentation is confusing me further, am I missing anything?

adriandrr avatar Dec 02 '22 14:12 adriandrr

Hi @adriandrr ,

Can you let me know what version you are using?

jodyphelan avatar Jan 09 '23 14:01 jodyphelan

Hey @jodyphelan

packages in environment at /homes/adrian/.conda/envs/sm: Name Version Build Channel tbprofiler 4.4.0 pypi_0 pypi

adriandrr avatar Jan 09 '23 14:01 adriandrr

Ok thanks, with the amplicon data you might not be able to get lineage information unless you amplicons specifically overlap the lineage-specific SNP sites that tb-profiler uses (https://github.com/jodyphelan/tbdb/blob/master/barcode.bed). This could explain why the lineage information is empty.

The error seems to be happening for a different reason though, I haven't encountered that error before and it seems to be I can help debug if you are allowed to share the data?

jodyphelan avatar Jan 09 '23 14:01 jodyphelan

Wow, that is really helpful. I have created that data with the DEEPLEX Myc-TB workflow from GenoScreen. They use a primer mix of unknown composition, but declare that they use the hsp65 locus for Species identification, the CRISPR/DR locus for spoligotyping and the phyloSNPS locus for Genotyping. Therefore, I assumed that data for these loci were present, but haven't considered the tbprofiler to use so many different sites.

I can gladly share the fastq files I used to test tbprofiler: forward read: https://drive.google.com/file/d/1tPiQxkfBG0U9-z9jSLsoM9lIZGnip7_x/view?usp=sharing reverse read: https://drive.google.com/file/d/1dM_6h70QdcohqOY-z3xK84pYdqVKycww/view?usp=sharing

Thanks a lot!

adriandrr avatar Jan 09 '23 14:01 adriandrr

Thanks, I downloaded the first one but might need permissions activated for the second

jodyphelan avatar Jan 09 '23 15:01 jodyphelan

I am sorry, does it work now?

adriandrr avatar Jan 09 '23 15:01 adriandrr

Yeah that works now. I'll have a look and get back to you.

jodyphelan avatar Jan 09 '23 15:01 jodyphelan

Seems to work ok for me. Perhaps the install didn't work correctly? I am going to release a new version of tb-profiler this week so maybe you can try with that version when it comes out? Results should look like this: tbprofiler.results.txt

With regards to the lineage - we can't use the default snp-based barcode but you can turn on spoligotying with --spoligotype and it will analyse those regions and give you a lineage based on that.

jodyphelan avatar Jan 09 '23 15:01 jodyphelan

Yeah, the error happened with the spoligotype flag.

However, thanks for the troubleshooting. I will try it with the new version and will be sure to give you information about the outcome.

adriandrr avatar Jan 09 '23 16:01 adriandrr

Ok the new version is on bioconda now:

conda create -n tb-profiler tb-profiler=4.4.1

jodyphelan avatar Jan 10 '23 17:01 jodyphelan