list index out of range
Hi,
Thanks for an awesome piece of software. I have used TranscriptClean on large-scale assemblies with high success before.
However I am now running a metatranscriptomics project in which I am only interested in reads that map to the COI gene of the taxon of interest. I have 6504 reads that map to the full-length reference COI sequence. I have sorted the .sam file with samtools, but when I run TranscriptClean I get the error that the list "index is out of range". When I inspect the mapping in a genome map viewer, it looks good albeit with some gaps here and there. Still, all my reads are within the reference.
I sort the samples samtools sort -O sam -T sample.sort -o sample.sort.sam mapped1.sam
I run this commant
python ....../TranscriptClean.py --sam sample.sort.sam --genome mygenome.fasta --out outfile
The program then returns
list index out of range Took 0:00:54 to process transcript batch. Took 0:00:00 to combine all outputs.
Below is a snippet from the sorted .sam file.
@HD VN:1.0 SO:coordinate @SQ SN:Facetotecta LN:1527 @RG ID:Unpaired_reads_assembled_against_Facetotecta SM: @PG ID:samtools PN:samtools VN:1.14 CL:samtools sort -O sam -T sample.sort -o sample.sort.sam mapped1_sorted_Facetotecta_cut_extraction.sam m54057_190926_040405/25100833/ccs_1 0 Facetotecta 1 255 2M1P2M1P1M5P1M10P1M3P1M2P1M5P1M4P1M1P1M2P1M2P1M1P1M1P1M1P2M3P1M2P1M2P1M1P1M2P1M1P1M2P4M3P1M1P1M1P1M2P2M1P1M2P1M5P2M3P1M1P1M2P2M4P2M1P1M2P1M2P1M8P1M3P1M21P2M4P1M5P1M2P1M2P1M3P1M4P1M5P1M3P1M5P1M1P1M3P2M3P1M1P1M6P1M1P1M3P1M2P1M1P1M10P1M2P1M18P1M1P2M4P1M6P1M1P1M8P2M3P2M11P1M2P1M6P1M2P1M6P1M3P2M2P1M7P1M6P1M1P1M1P1M1P1M1P1M9P1M8P3M5P1M1P1M1P1M1P1M1P2M7P1M3P1M1P1M1P1M2P1M1P1M14P1M1P1M4P1M4P1M1P1M12P3M2P1M6P1M1P1M3P2M2P1M1P1M1P2M3P1M3P2M3P1M3P2M1P1M4P2M23P1M4P1M4P1M8P1M2P1M1P1M1P1M17P1M1P1M5P1M3P1M1P1M16P1M1P1M1P2M3P1M5P1M1P3M1P3M1P1M2P2M4P2M1P1M1P1M5P2M3P1M7P1M5P1M2P1M2P1M1P1M1P1M1P1M1P1M3P2M1P1M30P1M1P1M2P1M1P2M8P1M3P1M8P1M1P1M1P1M8P2M1P2M1P1M1P1M4P2M5P1M1P1M4P1M10P1M5P1M4P1M5P1M2P1M10P1M1P1M1P1M3P1M4P1M4P2M1P2M9P2M4P2M3P2M2P1M1P1M3P2M2P1M2P2M2P3M3P1M17P1M4P1M1P1M3P2M2P1M4P1M8P1M1P1M1P1M2P1M2P3M1P1M3P1M4P1M1P2M2P1M1P1M3P1M3P1M3P1M4P1M3P1M1P1M3P1M1P1M5P2M2P1M1P2M1P1M3P1M1P1M1P1M3P1M8P2M1P2M1P1M2P2M11P1M3P2M8P1M1P2M14P1M14P2M8P1M1P2M3P1M4P1M5P1M1P1M1P1M1P1M2P2M6P1M1P1M1P1M1P1M4P1M2P1M3P1M6P1M2P1M2P2M2P1M1P3M1P1M2P1M6P1M2P1M1P1M2P1M9P1M2P2M4P1M4P1M5P1M6P1M1P1M1P1M3P1M3P1M4P1M7P1M8P1M8P1M9P1M16P1M2P1M18P1M4P1M12P1M6P1M3P1M3P1M2P1M6P1M1P1M12P1M1P1M1P1M12P2M7P1M1P1M3P1M3P1M1P2M1P1M4P1M3P1M3P1M1P1M7P1M3P1M2P2M21P1M6P1M3P1M1P2M29P2M2P1M2P1M1P1M81P2M1P1M4P2M1P1M2P1M2P1M1P2M1P1M1P1M1P2M1P1M4P2M1P1M2P2M10P1M3P1M3P1M1P1M4P1M1P1M1P1M1P1M2P1M1P1M1P2M6P1M9P1M3P1M2P2M3P1M7P1M2P1M3P1M1P3M4P1M6P1M2P1M1P2M1P1M3P1M1P1M9P1M1P1M1P2M3P1M4P2M3P3M1P1M10P1M8P1M4P1M2P1M4P1M2P1M2P1M4P2M5P1M2P1M5P1M1P2M3P1M1P1M1P2M13P1M1P1M1P1M2P1M2P1M12P1M9P1M1P1M1P1M1P1M2P1M3P2M2P1M2P1M8P1M1P2M1P2M1P3M10P2M4P1M2P1M4P1M4P1M1P1M8P1M2P1M1P1M4P2M1P2M2P1M2P1M3P1M9P1M5P2M4P2M17P1M1P1M13P1M2P1M3P1M11P1M2P1M10P1M2P1M22P1M1P1M19P1M4P1M3P1M14P1M5P1M3P1M2P1M3P1M5P1M12P1M11P1M2P1M2P1M6P1M2P1M10P1M1P1M9P1M3P1M1P1M4P1M2P1M2P1M1P1M12P1M3P1M2P1M1P1M1P1M2P1M2P1M3P1M2P1M4P1M5P2M1P1M2P1M2P1M1P1M2P1M3P1M3P1M6P1M1P1M3P1M2P1M6P1M3P1M6P1M1P1M3P1M1P1M1P1M4P1M4P1M8P1M6P1M1P1M1P1M2P2M * 0 0 ATGAAACGATGATTATTTTCCACTAACCACAAAGACATTGGTACAATGTACTTTATCCTGGGAGCGTGATCAGGTATAATCGGTACTGGTATAAGAATACTTATTCGAAGGGAACTAGGTCAACCCGGTAGACTTATTGGTAATGACCAAATTTACAACGTAATTGTTACAGCTCATGCATTTATCATAATTTTCTTTATAGTTATACCTATTATAATTGGAGGCTTTGGCAATTGGCTTGTTCCTCTTATAATTGGAGCTCCTGATATAGCCTTCCCTCGAATAAACAATATAAGATTTTGACTTCTTCCTCCTTCCCTCTCTCTTCTTTTATCAAGAAGATTAACTGAATCTGGAGTTGGAACAGGATGAACAGTTTACCCTCCTCTTTCAAGTAATATTGCCCACAGTGGTATTTCCGTTGACTTAGCTATCTTCTCACTCCATTTGGCAGGAGCAAGATCAATTTTAGGTGCCATTAATTTCATTACTACTATCATCAATATACGTAATAAAATAATCACAATAGACCGATTACCTCTATTTGTATGATCAGTTTTCATCACAGCGTTTCTCC * RG:Z:Unpaired_reads_assembled_against_Facetotecta m54057_190926_040405/7602703/ccs_2 0 Facetotecta 1 255 2M1P2M1P1M5P1M10P1M3P1M2P1M5P1M4P1M1P1M2P1M2P1M1P1M1P1M1P2M3P1M2P1M2P1M1P1M2P1M1P1M2P4M3P1M1P1M1P1M2P2M1P1M2P1M5P2M3P1M1P1M2P2M4P2M1P1M2P1M2P1M8P1M3P1M21P2M4P1M5P1M2P1M2P1M3P1M4P1M5P1M3P1M5P1M1P1M3P2M3P1M1P1M6P1M1P1M3P1M2P1M1P1M10P1M2P1M18P1M1P2M4P1M6P1M1P1M8P2M3P2M11P1M2P1M6P1M2P1M6P1M3P2M2P1M7P1M6P1M1P1M1P1M1P1M1P1M9P1M8P3M5P1M1P1M1P1M1P1M1P2M7P1M3P1M1P1M1P1M2P1M1P1M14P1M1P1M4P1M4P1M1P1M12P3M2P1M6P1M1P1M3P2M2P1M1P1M1P2M3P1M3P2M3P1M3P2M1P1M4P2M23P1M4P1M4P1M8P1M2P1M1P1M1P1M17P1M1P1M5P1M3P1M1P1M16P1M1P1M1P2M3P1M5P1M1P3M1P3M1P1M2P2M4P2M1P1M1P1M5P2M3P1M7P1M5P1M2P1M2P1M1P1M1P1M1P1M1P1M3P2M1P1M30P1M1P1M2P1M1P2M8P1M3P1M8P1M1P1M1P1M8P2M1P2M1P1M1P1M4P2M5P1M1P1M4P1M10P1M5P1M4P1M5P1M2P1M10P1M1P1M1P1M3P1M4P1M4P2M1P2M9P2M4P2M3P2M2P1M1P1M3P2M2P1M2P2M2P3M3P1M17P1M4P1M1P1M3P2M2P1M4P1M8P1M1P1M1P1M2P1M2P3M1P1M3P1M4P1M1P2M2P1M1P1M3P1M3P1M3P1M4P1M3P1M1P1M3P1M1P1M5P2M2P1M1P2M1P1M3P1M1P1M1P1M3P1M8P2M1P2M1P1M2P2M11P1M3P2M8P1M1P2M14P1M14P2M8P1M1P2M3P1M4P1M5P1M1P1M1P1M1P1M2P2M6P1M1P1M1P1M1P1M4P1M2P1M3P1M6P1M2P1M2P2M2P1D1P3D1P1D2P1M6P1M2P1M1I1M2P1M8P1I1M2P2M4P1M4P1M5P1M6P1M1P1M1P1M3P1M3P1M4P1M4P3I1M8P1M8P1M9P1M16P1M2P1M18P1M4P1M12P1M6P1M3P1M3P1M2P1M6P1M1P1M12P1M1P1M1P1M12P2M7P1M1P1M3P1M3P1M1P2M1P1M4P1M3P1M3P1M1P1M7P1M3P1M2P2M21P1M6P1M3P1M1P2M29P2M2P1M2P1M1P1M81P2M1P1M4P2M1P1M2P1M2P1M1P2M1P1M1P1M1P2M1P1M4P2M1P1M2P2M10P1M3P1M3P1M1P1M4P1M1P1M1P1M1P1M2P1M1P1M1P2M6P1M9P1M3P1M2P2M3P1M7P1M2P1M3P1M1P3M4P1M6P1M2P1M1P2M1P1M3P1M1P1M9P1M1P1M1P2M3P1M4P2M3P3M1P1M10P1M8P1M4P1M2P1M4P1M2P1M2P1M4P2M5P1M2P1M5P1M1P2M3P1M1P1M1P2M13P1M1P1M1P1M2P1M2P1M12P1M9P1M1P1M1P1M1P1M2P1M3P2M2P1M2P1M8P1M1P2M1P2M1P3M10P2M4P1M2P1M4P1M4P1M1P1M8P1M2P1M1P1M4P2M1P2M2P1M2P1M3P1M9P1M5P2M4P2M17P1M1P1M13P1M2P1M3P1M11P1M2P1M10P1M2P1M22P1M1P1M19P1M4P1M3P1M14P1M5P1M3P1M2P1M3P1M5P1M12P1M11P1M2P1M2P1M6P1M2P1M10P1M1P1M9P1M3P1M1P1M4P1M2P1M2P1M1P1M12P1M3P1M2P1M1P1M1P1M2P1M2P1M3P1M2P1M4P1M5P2M1P1M2P1M2P1M1P1M2P1M3P1M3P1M6P1M1P1M3P1M2P1M6P1M3P1M6P1M1P1M3P1M1P1M1P1M4P1M4P1M8P1M6P1M1P1M1P1M2P2M4P3M2P1M2P1I1M3P1M4P1M20P1D4P1M2P1M1P1M1P2M13P1M6P2M2P1M5P2M2P2M1P1M1P1M2P1M1P1M1P1M1P1M1P1M4P3M1P2M1P1M2P1M1P1M3P6M4P1M1P2M1P2M2P2M2P1M5P1M1P1M13P1M2P1M1P1M1P1M1P1M8P1M8P1M2P1M4P1M3P1M1P2M14P3M1P1M4P1M3P1M2P1M2P1M12P2M1P3M2P2M2P2M11P1M1P1M2P1D6P1D3P1D7P1D2P1D14P1D3P1M2P1M1P1M3P2M1P1M4P2M2P3M1P2M1P1M1P2M1P1M1P1M2P2M2P2M1P2M8P2M1P3M1P1M1P5M2P1M8P2M1P1M1P2M2P1M1P2M1P1M2P1M1P1M1P3M2P2M1P1M1P1M1P2M2P1M8P1M1P1M1P2M3P1M1P3M2P1M4P1M2P2M2P1M1P1M1P1M1P2M3P1M2P2M51P1M1P1M1P1M1P1M2P2M1P2M208P1M1P1M1P1M1P1M1P1M1P1M2P1M1P1M1P2M9P1M1P2M1P2M4P1M1P1M4P1M1P1M1P1M3P1M3P1M1P1M1P1M1P1M1P1M14P1M4P1M * 0 0 ATGAAACGATGATTATTTTCAACCAATCATAAAGATATTGGAACTATATATATAATATTCGGCGCCTGATCCGGCACTATAGGAGTGGCAATAAGAATAATTATCCGTAGAGAACTAGGGCAACCCGGTTCTCTAATTGGTAACGATCAAATCTATAATGTAATTGTAACTGCCCACGCCTTTATCATAATTTTCTTTATAGTAATACCAATCATAATTGGAGGATTTGGAAACTGACTAATTCCTCTGATATTAGGATCCCCTGATATAGCATTTCCACGGATAAATAACATAAGATTCTGACTACTCCCCCCATCATTAATTCTTTTAATTAGAAGAAGACTAACAGAAAGGGGGGTAGGAACAGGATGAACGGTCTATCCTCCTCTTTCAAGAAATATCTCTCATAGAGGAGTCTCAGTAGACATGGCCATCTTCTCCCTCCACTTAGCTGGAGCAAGATCCATTTTAGGAGCCATTAATTTTATTACTACGATCATTAATATACGCAACAAAAACCTTTCTTTTGACCGTCTACCATTATTAGTATGATCTATCTTTATTACTACTATCCTTTTACTACTTTCTTTACCAGTACTTGCCGGAGCTATTACCATACTATTAACAGATCGAAATATTAATACTTCATTCTTTGATCCAGGTGGGGATCCTGTATTATATCAACATCTATTTTGATTTTTCGGACACCCAGAAGTTTATATTTTAATTCTACCAGGGTTTGGAATAGTTTCCCACATTATTAGACAAGAAAG *
Any ideas about what goes wrong? It looks like TrascriptClean cannot run without a proper genome map or chromosome list, but I wanted to ask in case others are getting the same "error".
Appreciate any help and I am open to other solutions. Niklas