Monopogen
Monopogen copied to clipboard
preprocess error
hey I got the error while running preprocess always. could you help me out?
[2023-10-17 19:20:15,607] INFO Monopogen.py Performing data preprocess before variant calling...
[2023-10-17 19:20:15,607] INFO germline.py Parameters in effect:
[2023-10-17 19:20:15,607] INFO germline.py --subcommand = [preProcess]
[2023-10-17 19:20:15,607] INFO germline.py --bamFile = [bam.lst]
[2023-10-17 19:20:15,607] INFO germline.py --out = [s1_out]
[2023-10-17 19:20:15,607] INFO germline.py --app_path = [/home/big/zheng/Monopogen/apps]
[2023-10-17 19:20:15,607] INFO germline.py --max_mismatch = [3]
[2023-10-17 19:20:15,607] INFO germline.py --nthreads = [8]
[2023-10-17 19:20:15,614] DEBUG Monopogen.py PreProcessing sample all_cells
[2023-10-17 19:20:15,809] INFO germline.py The contig chr5 does not contain the prefix 'chr' and we will add 'chr' on it
[2023-10-17 19:20:15,814] INFO germline.py The contig chr1 does not contain the prefix 'chr' and we will add 'chr' on it
[2023-10-17 19:20:15,818] INFO germline.py The contig chr2 does not contain the prefix 'chr' and we will add 'chr' on it
[2023-10-17 19:20:15,820] INFO germline.py The contig chr4 does not contain the prefix 'chr' and we will add 'chr' on it
[2023-10-17 19:20:15,821] INFO germline.py The contig chr6 does not contain the prefix 'chr' and we will add 'chr' on it
[2023-10-17 19:20:15,821] INFO germline.py The contig chr3 does not contain the prefix 'chr' and we will add 'chr' on it
[2023-10-17 19:20:15,823] INFO germline.py The contig chr8 does not contain the prefix 'chr' and we will add 'chr' on it
[2023-10-17 19:20:15,823] INFO germline.py The contig chr7 does not contain the prefix 'chr' and we will add 'chr' on it
[2023-10-17 19:20:15,921] INFO germline.py The contig chr9 does not contain the prefix 'chr' and we will add 'chr' on it
[2023-10-17 19:20:15,929] INFO germline.py The contig chr10 does not contain the prefix 'chr' and we will add 'chr' on it
[2023-10-17 19:20:15,949] INFO germline.py The contig chr11 does not contain the prefix 'chr' and we will add 'chr' on it
[2023-10-17 19:20:15,954] INFO germline.py The contig chr12 does not contain the prefix 'chr' and we will add 'chr' on it
[2023-10-17 19:20:15,954] INFO germline.py The contig chr13 does not contain the prefix 'chr' and we will add 'chr' on it
[2023-10-17 19:20:15,956] INFO germline.py The contig chr15 does not contain the prefix 'chr' and we will add 'chr' on it
[2023-10-17 19:20:15,958] INFO germline.py The contig chr14 does not contain the prefix 'chr' and we will add 'chr' on it
[2023-10-17 19:20:15,959] INFO germline.py The contig chr16 does not contain the prefix 'chr' and we will add 'chr' on it
[2023-10-17 19:20:16,026] INFO germline.py The contig chr17 does not contain the prefix 'chr' and we will add 'chr' on it
[2023-10-17 19:20:16,033] INFO germline.py The contig chr18 does not contain the prefix 'chr' and we will add 'chr' on it
[2023-10-17 19:20:16,055] INFO germline.py The contig chr19 does not contain the prefix 'chr' and we will add 'chr' on it
[2023-10-17 19:20:16,061] INFO germline.py The contig chr20 does not contain the prefix 'chr' and we will add 'chr' on it
[2023-10-17 19:20:16,063] INFO germline.py The contig chr21 does not contain the prefix 'chr' and we will add 'chr' on it
[2023-10-17 19:20:16,065] INFO germline.py The contig chr22 does not contain the prefix 'chr' and we will add 'chr' on it
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/zheng/anaconda3/envs/monopogen/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/zheng/anaconda3/envs/monopogen/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "/home/big/zheng/Monopogen/src/germline.py", line 200, in BamFilter
for s in infile.fetch(search_chr):
File "pysam/libcalignmentfile.pyx", line 1089, in pysam.libcalignmentfile.AlignmentFile.fetch
File "pysam/libchtslib.pyx", line 683, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid contig `5`
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/big/zheng/Monopogen/src/Monopogen.py", line 435, in <module>
main()
File "/home/big/zheng/Monopogen/src/Monopogen.py", line 428, in main
args.func(args)
File "/home/big/zheng/Monopogen/src/Monopogen.py", line 313, in preProcess
result = pool.map(BamFilter, para_lst)
File "/home/zheng/anaconda3/envs/monopogen/lib/python3.8/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/zheng/anaconda3/envs/monopogen/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
ValueError: invalid contig `5`
the output of samtools view -h sublibrary1_chr_sorted.bam | head -n 25 is
@HD VN:1.4 SO:coordinate
@SQ SN:chr1 LN:248956422
@SQ SN:chr10 LN:133797422
@SQ SN:chr11 LN:135086622
@SQ SN:chr12 LN:133275309
@SQ SN:chr13 LN:114364328
@SQ SN:chr14 LN:107043718
@SQ SN:chr15 LN:101991189
@SQ SN:chr16 LN:90338345
@SQ SN:chr17 LN:83257441
@SQ SN:chr18 LN:80373285
@SQ SN:chr19 LN:58617616
@SQ SN:chr2 LN:242193529
@SQ SN:chr20 LN:64444167
@SQ SN:chr21 LN:46709983
@SQ SN:chr22 LN:50818468
@SQ SN:chr3 LN:198295559
@SQ SN:chr4 LN:190214555
@SQ SN:chr5 LN:181538259
@SQ SN:chr6 LN:170805979
@SQ SN:chr7 LN:159345973
@SQ SN:chr8 LN:145138636
@SQ SN:chr9 LN:138394717
@SQ SN:chrMT LN:16569
@SQ SN:chrX LN:156040895
and the output of samtools view sublibrary1_chr_sorted.bam | head -n 10 is
63_76_14__R__159_76_14__ACGGACTC_AGATGTAC_AACCGAGA__TCCGGCTAAA__230914Xm_CAGATC 0 chr1 10002 255 108M42S *0 0 AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCACTAGATTCCGTCCACAGTCTCAAGCACGTGGATGTACAGCTA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFF::::F,:,FFFFFFF,:,,,F,FFFFF,,,,,,:,F,F,F,F NH:i:1 HI:i:1 AS:i:106 nM:i:0 GX:Z: GN:Z: pN:Z:TCCGGCTAAA CR:Z:ACGGACTC_AGATGTAC_AACCGAGA CB:Z:63_76_14__s1 pB:Z:159_76_14 pS:Z:MRD016_D30 RE:A:N
30_91_44__T__30_91_44__ACTTTACC_CTAAGGTC_CTGAGCCA__ATCCAGAATG__230914Xm_CAGATC 16 chr1 10005 1 10S97M77N43M *0 0 TAAGCCTATTCCTAACAGTATCAATATCACTAACCCGTACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC ::F,FFFF,,F,,F,,,,F,,F,,,,,,,F,F,FF,,,F,FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFF NH:i:3 HI:i:1 AS:i:120 nM:i:9 GX:Z: GN:Z: pN:Z:ATCCAGAATG CR:Z:ACTTTACC_CTAAGGTC_CTGAGCCA CB:Z:30_91_44__s1 pB:Z:30_91_44 pS:Z:MRD007_Transplant RE:A:N
04_26_30__R__100_26_30__GCTTATAG_AGCAGGAA_CAACCACA__TATGAAGATT__230914Xm_CAGATC 16 chr1 10534 3 96M2D27M1S *0 0 AGTACCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGGCGCAGAGAGC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:2 HI:i:1 AS:i:111 nM:i:2 GX:Z: GN:Z: pN:Z:TATGAAGATT CR:Z:GCTTATAG_AGCAGGAA_CAACCACA CB:Z:04_26_30__s1 pB:Z:100_26_30 pS:Z:MRD002_D30 RE:A:N
04_26_30__R__100_26_30__GCTTATAG_AGCAGGAA_CAACCACA__TATGAAGATT__230914Xm_CAGATC 16 chr1 10534 3 96M2D27M1S *0 0 AGTACCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGGCGCAGAGAGC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:2 HI:i:1 AS:i:111 nM:i:2 GX:Z: GN:Z: pN:Z:TATGAAGATT CR:Z:GCTTATAG_AGCAGGAA_CAACCACA CB:Z:04_26_30__s1 pB:Z:100_26_30 pS:Z:MRD002_D30 RE:A:N
04_26_30__R__100_26_30__GCTTATAG_AGCAGGAA_CAACCACA__TATGAAGATT__230914Xm_CAGATC 16 chr1 10534 3 96M2D27M1S *0 0 AGTACCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGGCGCAGAGAGC FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:2 HI:i:1 AS:i:111 nM:i:2 GX:Z: GN:Z: pN:Z:TATGAAGATT CR:Z:GCTTATAG_AGCAGGAA_CAACCACA CB:Z:04_26_30__s1 pB:Z:100_26_30 pS:Z:MRD002_D30 RE:A:N
13_81_76__R__109_81_76__TATGTGTC_ATCATTCC_AGATGTAC__GCTTCATTTT__230914Xm_CAGATC 16 chr1 10535 3 95M2D25M *0 0 GTACCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGGCGCAGAG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:2 HI:i:1 AS:i:108 nM:i:2 GX:Z: GN:Z: pN:Z:GCTTCATTTT CR:Z:TATGTGTC_ATCATTCC_AGATGTAC CB:Z:13_81_76__s1 pB:Z:109_81_76 pS:Z:MRD004_Transplant RE:A:N
04_26_30__R__100_26_30__GCTTATAG_AGCAGGAA_CAACCACA__TATGAAGATT__230914Xm_CAGATC 16 chr1 10538 3 92M2D27M1S *0 0 CCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGGCGCAGAGAGC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:2 HI:i:1 AS:i:107 nM:i:2 GX:Z: GN:Z: pN:Z:TATGAAGATT CR:Z:GCTTATAG_AGCAGGAA_CAACCACA CB:Z:04_26_30__s1 pB:Z:100_26_30 pS:Z:MRD002_D30 RE:A:N
14_32_14__R__110_32_14__CAATTCTC_CAATGGAA_AACCGAGA__GAGGGGCGCG__230914Xm_CAGATC 16 chr1 10538 3 92M2D30M *0 0 CCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGGCGCAGAGAGGCG FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:2 HI:i:1 AS:i:110 nM:i:2 GX:Z: GN:Z: pN:Z:GAGGGGCGCG CR:Z:CAATTCTC_CAATGGAA_AACCGAGA CB:Z:14_32_14__s1 pB:Z:110_32_14 pS:Z:MRD004_Transplant RE:A:N
14_32_14__R__110_32_14__CAATTCTC_CAATGGAA_AACCGAGA__GAGGGGCGCG__230914Xm_CAGATC 16 chr1 10538 3 92M2D30M *0 0 CCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGGCGCAGAGAGGCG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:2 HI:i:1 AS:i:110 nM:i:2 GX:Z: GN:Z: pN:Z:GAGGGGCGCG CR:Z:CAATTCTC_CAATGGAA_AACCGAGA CB:Z:14_32_14__s1 pB:Z:110_32_14 pS:Z:MRD004_Transplant RE:A:N
14_32_14__R__110_32_14__CAATTCTC_CAATGGAA_AACCGAGA__GAGGGGCGCG__230914Xm_CAGATC 16 chr1 10540 3 90M2D30M *0 0 ACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGGCGCAGAGAGGCG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:2 HI:i:1 AS:i:108 nM:i:2 GX:Z: GN:Z: pN:Z:GAGGGGCGCG CR:Z:CAATTCTC_CAATGGAA_AACCGAGA CB:Z:14_32_14__s1 pB:Z:110_32_14 pS:Z:MRD004_Transplant RE:A:N
Have you figured this out? I am having the same issue
Could you examine your input bam files to see whether there is prefix chr? If it has (i.e., sublibrary1_chr_sorted.bam), it is weired that Monopogen re-add it based on the log information.