tbsp
tbsp copied to clipboard
Problem encountered using bam2vcf.py
Hello, I was trying to analyze a 10X data. I wanted to see the manual of bam2vcf.py by python bam2vcf.py --help
However, it returns error:
File "bam2vcf.py", line 3, in <module>
from File import *
ModuleNotFoundError: No module named 'File'
I'm not sure what is this File
module? I tried searching on google but apparently it got mixed up by google search engine with ordinary file instead some python package named File
, and pip install file
didn't work out either.
I'm not sure what should I do to resolve the problem?
@Jiayi-Zheng File.py is provided under the main program folder (tbsp/File.py) You can copy bam2vcf.py to the same folder and run it or, you can add the folder to your pythonpath
@phoenixding Thank you so much for your reply!
I tried add:
import os os. chdir('/home/joyzheng/.conda/envs/py38/lib/python3.8/site-packages/tbsp')
to top lines of bam2vcf.py
, still it showed ModuleNotFoundError: No module named 'File'
, but when I move the bam2vcf.py
to the same directory, it worked! Gonna go try out your amazing clonal construction tool.
Thank you so much for your help!
@phoenixding Hello, I was trying the program and did:
cd /home/user/.conda/envs/py38/lib/python3.8/site-packages/tbsp
#where bam2vcf.py is placed
python bam2vcf.py -i /usersdata/user/GW15_Trachea/GW15-Trachea/outs/cells/1.bam -r /home/user/picard/GRCh38.primary_assembly.genome.fa -o /usersdata/user/GW15_Trachea/GW15-Trachea/outs/1.vcf
#I also tried using this command with -o /usersdata/user/GW15_Trachea/GW15-Trachea/outs/
However, it returned:
Traceback (most recent call last):
File "bam2vcf.py", line 115, in <module>
main()
File "bam2vcf.py", line 48, in main
shutil.copy2(sampleID,"%s/%s"%(outputdir,sampleID))
File "/home/user/.conda/envs/py38/lib/python3.8/shutil.py", line 435, in copy2
copyfile(src, dst, follow_symlinks=follow_symlinks)
File "/home/user/.conda/envs/py38/lib/python3.8/shutil.py", line 264, in copyfile
with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: '/usersdata/user/GW15_Trachea/GW15-Trachea/outs/1.vcf//usersdata/user/GW15_Trachea/GW15-Trachea/outs/cells/1.bam'
I can see why the said directory does not exist, however, I am not sure as to why the input and output directory are suddenly combined and what could I do to fix it...?
Thank you so much!
@Jiayi-Zheng
The output folder should be just a name, not the complete path to the directory
in this case, it should be just "outs", it will be stored under the same directory as your input file by default
try something like the following cd /home/user/.conda/envs/py38/lib/python3.8/site-packages/tbsp #where bam2vcf.py is placed python bam2vcf.py -i /usersdata/user/GW15_Trachea/GW15-Trachea/outs/cells/1.bam -r /home/user/picard/GRCh38.primary_assembly.genome.fa -o outs
@phoenixding I tried:
cd /home/user/.conda/envs/py38/lib/python3.8/site-packages/tbsp
#where bam2vcf.py is placed
python bam2vcf.py -i /usersdata/user/GW15_Trachea/GW15-Trachea/outs/mycell/2.bam -r /home/user/picard/GRCh38.primary_assembly.genome.fa -o outs
However, it's still showing the same error:
Traceback (most recent call last):
File "bam2vcf.py", line 115, in <module>
main()
File "bam2vcf.py", line 48, in main
shutil.copy2(sampleID,"%s/%s"%(outputdir,sampleID))
File "/home/user/.conda/envs/py38/lib/python3.8/shutil.py", line 435, in copy2
copyfile(src, dst, follow_symlinks=follow_symlinks)
File "/home/user/.conda/envs/py38/lib/python3.8/shutil.py", line 264, in copyfile
with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: 'outs//usersdata/user/GW15_Trachea/GW15-Trachea/outs/mycell/2.bam'
@JingtaoWang22 can you help @Jiayi-Zheng to fix this problem? it's a prefixing issue. You can clearly see that "outs//" should be "outs/" as the directory. the method should work with the relative path and seems to have an extra "/" for the absolute path. Can you fix this issue? thanks
@Jiayi-Zheng Hello Jiayi, I'm Jun's student and will try to help you with the issue. Firstly, could you confirm that you didn't add a '/' after this part of your command "-o outs"? This means you didn't type '-o outs'? Secondly, if this is the case, then could you try to add this right before line 48 of bam2vcf.py? if outputdir[-1]=='/': outputdir=outputdir[:-1] You can add them at line 46&47.
Please let us know if this works. We will fix this in future versions.
@JingtaoWang22 Thank you for your help.
It didn't fix the problem, the same error still occurs (I tried print(outputdir)
as well and it prints outs
)
Meanwhile, since the error mentioned shutil.py
I thought I could look into it. Turns out, if I try to print(dst)
right before line 435 in shutil.py
, it outputs outs//usersdata/user/GW15_Trachea/GW15-Trachea/outs/mycell/2.bam
. I guess that's how the problem arises.
@Jiayi-Zheng I looked into it and I guess the actual problem might not be the extra '/'. The problem might be with the input directory.
The function (shutil.copy2) at line 48 of bam2vcf.py is trying to copy stuff from source directory 'outputdir' to target directory 'sampleID'. And according to this post shutil.copy2 gives extra slash when you specify the wrong directory. 'sampleID' is the input (-i) file that you specified and 'outputdir' is the output directory (-o) that you specified in the command. Could you double check if they exist and are correct? I was able to reproduce your error by specifying a non-existent input directory.
Specifically, is '/usersdata/user/GW15_Trachea/GW15-Trachea/outs/mycell/2.bam' the directory you put your input data? Is 'usersdata' a folder under the root directory? If your OS is windows, then I would guess there should be a "C:" disk in the front. If you are using a linux OS, then usually 'userdata' is not a folder under the root directory. I will be something like '/home/YourUserName/SomeFolder/...'. An alternative might be directly putting the input file in your work directory and use a relative path (only the file name) instead of absolute path.
Could you please check this and let us know?
@JingtaoWang22
I am using a virtual server account based on linux, when I use cd /usersdata/user/GW15_Trachea/GW15-Trachea/outs/mycell
it works fine. So I don't think it's the pathway problem, unless things work different on virtual server account...? (sorry my python and linux are very intro level...)
@Jiayi-Zheng Thanks for the information. Then another possibility is that the problem is with the output directory. Probably the code is trying to copy the whole '/usersdata/user/GW15_Trachea/GW15-Trachea/outs/mycell/2.bam' path into the 'outs' folder and ''/usersdata/user/GW15_Trachea/GW15-Trachea/outs/mycell/' does not exist in the 'outs' folder.
Could you try to put '2.bam' in your work directory (i.e. the folder you run the code, which is the same place you put 'outs') and run 'python bam2vcf.py -i 2.bam -r /home/user/picard/GRCh38.primary_assembly.genome.fa -o outs'? Probably you could also put 'GRCh38.primary' in the same directory as well. Alternatively, you could move the tbsp package into '/usersdata/user/GW15_Trachea/GW15-Trachea/outs/mycell/', whichever is easier for you.
Hope this helps.
@JingtaoWang22 Thanks for your help! I am currently installing GATK (after I tried directly setting my output file directory in the python files I realized I haven't yet install GATK in the conda env). Sorry for the late reply, I have been struggling on its installation lately. May I confirm the version of conda environment softwares with you?
Python packages dependencies:
-- scikit-learn
-- scipy
-- numpy
-- matplotlib
-- networkx
-- pyBigWig
-- Biopython
-- decorator
Should I just get the newest version from conda or is there some requirements?
Meanwhile, I see that GATK has updated quite a few times in the recent years from 3.x to currently 4.3, may I confirm which version of GATK is preferable?
In addition, due to the somewhat problematic things I encountered when installing suitable env for GATK... I'm also thinking about converting bam2vcf via other softwares and then put them into the tbsp pipeline. If I may, the pre-processing before tbsp pipeline should be:
samtools faidx xxx.bam
samtools sort xxx.bam
bam2vcf
then they should be readily available for the tbsp pipeline...? Or is there some other steps in between that I should be aware of and find some way to get around it without GATK (if at the end of the day I still fail to get the GATK properly working)
Thank you very much for your help.
Hi @Jiayi-Zheng , no need to install GATK since it's already included if you downloaded the bam2vcf file. Could you try to put everything into the bam2vcf folder as I previously described and try again? I suspect this is a directory problem since I was able to reproduce the error by providing the wrong directory.
Hi @JingtaoWang22
I tried: move a bam file 2762.bam
into tbsp package directory, move my working directory into tbsp package, then
(tbsp) [joyzheng@hpc02 mycell]$ cd tbsp
(tbsp) [joyzheng@hpc02 tbsp]$ pwd
/usersdata/joyzheng/GW15_Trachea/GW15-Trachea/outs/mycell/tbsp
(tbsp) [joyzheng@hpc02 tbsp]$ ls
2762.bam File.py GRCh38.primary_assembly.genome.fa outs
bam2vcf.py gatk GRCh38.primary_assembly.genome.fa.fai __pycache__
BioUtils.py GRCh38.dict __init__.py tbsp.py
(tbsp) [joyzheng@hpc02 tbsp]$ python bam2vcf.py -i 2762.bam -r GRCh38.primary_assembly.genome.fa -o outs
The following were returned.
`14:51:49.269 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usersdata/joyzheng/GW15_Trachea/GW15-Trachea/outs/mycell/tbsp/gatk/picard.jar!/com/intel/gkl/native/libgkl_compression.so [Tue Nov 01 14:51:49 HKT 2022] CreateSequenceDictionary OUTPUT=GRCh38.dict REFERENCE=GRCh38.primary_assembly.genome.fa TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false [Tue Nov 01 14:51:49 HKT 2022] Executing as [email protected] on Linux 4.18.0-305.7.1.el8_4.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_292-b10; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.1-SNAPSHOT [Tue Nov 01 14:52:01 HKT 2022] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.21 minutes. Runtime.totalMemory()=3670540288 [bam_sort] Use -T PREFIX / -o FILE to specify temporary and final output files Usage: samtools sort [options...] [in.bam] Options: -l INT Set compression level, from 0 (uncompressed) to 9 (best) -m INT Set maximum memory per thread; suffix K/M/G recognized [768M] -n Sort by read name -o FILE Write final output to FILE rather than standard output -T PREFIX Write temporary files to PREFIX.nnnn.bam -@, --threads INT Set number of sorting and compression threads [1] --input-fmt-option OPT[=VAL] Specify a single input file format option in the form of OPTION or OPTION=VALUE -O, --output-fmt FORMAT[,OPT[=VAL]]... Specify output format (SAM, BAM, CRAM) --output-fmt-option OPT[=VAL] Specify a single output file format option in the form of OPTION or OPTION=VALUE --reference FILE Reference sequence FASTA FILE [null] 14:52:05.144 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usersdata/joyzheng/GW15_Trachea/GW15-Trachea/outs/mycell/tbsp/gatk/gatk-package-4.0.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so [Tue Nov 01 14:52:05 HKT 2022] AddOrReplaceReadGroups --INPUT outs/2762.bam_sort.bam --OUTPUT outs/2762.bam_sort_addgroup --SORT_ORDER coordinate --RGLB lib1 --RGPL illumina --RGPU unit1 --RGSM 20 --RGID 1 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false [Tue Nov 01 14:52:05 HKT 2022] Executing as [email protected] on Linux 4.18.0-305.7.1.el8_4.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_292-b10; Deflater: Intel; Inflater: Intel; Picard version: Version:4.0.3.0 [Tue Nov 01 14:52:05 HKT 2022] picard.sam.AddOrReplaceReadGroups done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=1472200704 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp htsjdk.samtools.SAMException: Cannot read non-existent file: file:///usersdata/joyzheng/GW15_Trachea/GW15-Trachea/outs/mycell/tbsp/outs/2762.bam_sort.bam at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:426) at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:413) at htsjdk.samtools.util.IOUtil.assertInputIsValid(IOUtil.java:389) at picard.sam.AddOrReplaceReadGroups.doWork(AddOrReplaceReadGroups.java:147) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:269) at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289) 14:52:06.771 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usersdata/joyzheng/GW15_Trachea/GW15-Trachea/outs/mycell/tbsp/gatk/gatk-package-4.0.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so [Tue Nov 01 14:52:06 HKT 2022] MarkDuplicates --INPUT outs/2762.bam_sort_addgroup --OUTPUT outs/2762.bam_sort_addgroup_rawdedupped --METRICS_FILE outs/2762.bam_out.metrics --MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP 50000 --MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 8000 --SORTING_COLLECTION_SIZE_RATIO 0.25 --TAG_DUPLICATE_SET_MEMBERS false --REMOVE_SEQUENCING_DUPLICATES false --TAGGING_POLICY DontTag --CLEAR_DT true --ADD_PG_TAG_TO_READS true --REMOVE_DUPLICATES false --ASSUME_SORTED false --DUPLICATE_SCORING_STRATEGY SUM_OF_BASE_QUALITIES --PROGRAM_RECORD_ID MarkDuplicates --PROGRAM_GROUP_NAME MarkDuplicates --READ_NAME_REGEX <optimized capture of last three ':' separated fields as numeric values> --OPTICAL_DUPLICATE_PIXEL_DISTANCE 100 --MAX_OPTICAL_DUPLICATE_SET_SIZE 300000 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false [Tue Nov 01 14:52:06 HKT 2022] Executing as [email protected] on Linux 4.18.0-305.7.1.el8_4.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_292-b10; Deflater: Intel; Inflater: Intel; Picard version: Version:4.0.3.0 [Tue Nov 01 14:52:06 HKT 2022] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=1465909248 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp htsjdk.samtools.SAMException: Cannot read non-existent file: file:///usersdata/joyzheng/GW15_Trachea/GW15-Trachea/outs/mycell/tbsp/outs/2762.bam_sort_addgroup at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:426) at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:413) at htsjdk.samtools.util.IOUtil.assertInputIsValid(IOUtil.java:389) at htsjdk.samtools.util.IOUtil.assertInputsAreValid(IOUtil.java:465) at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:224) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:269) at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289) 14:52:07.392 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usersdata/joyzheng/GW15_Trachea/GW15-Trachea/outs/mycell/tbsp/gatk/picard.jar!/com/intel/gkl/native/libgkl_compression.so [Tue Nov 01 14:52:07 HKT 2022] ReorderSam INPUT=outs/2762.bam_sort_addgroup_rawdedupped OUTPUT=outs/2762.bam_sort_addgroup_dedupped ALLOW_INCOMPLETE_DICT_CONCORDANCE=true REFERENCE=GRCh38.primary_assembly.genome.fa ALLOW_CONTIG_LENGTH_DISCORDANCE=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false [Tue Nov 01 14:52:07 HKT 2022] Executing as [email protected] on Linux 4.18.0-305.7.1.el8_4.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_292-b10; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.1-SNAPSHOT [Tue Nov 01 14:52:07 HKT 2022] picard.sam.ReorderSam done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=2058354688 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" htsjdk.samtools.SAMException: Cannot read non-existent file: file:///usersdata/joyzheng/GW15_Trachea/GW15-Trachea/outs/mycell/tbsp/outs/2762.bam_sort_addgroup_rawdedupped at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:426) at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:413) at picard.sam.ReorderSam.doWork(ReorderSam.java:129) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:282) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108) [bam_sort] Use -T PREFIX / -o FILE to specify temporary and final output files Usage: samtools sort [options...] [in.bam] Options: -l INT Set compression level, from 0 (uncompressed) to 9 (best) -m INT Set maximum memory per thread; suffix K/M/G recognized [768M] -n Sort by read name -o FILE Write final output to FILE rather than standard output -T PREFIX Write temporary files to PREFIX.nnnn.bam -@, --threads INT Set number of sorting and compression threads [1] --input-fmt-option OPT[=VAL] Specify a single input file format option in the form of OPTION or OPTION=VALUE -O, --output-fmt FORMAT[,OPT[=VAL]]... Specify output format (SAM, BAM, CRAM) --output-fmt-option OPT[=VAL] Specify a single output file format option in the form of OPTION or OPTION=VALUE --reference FILE Reference sequence FASTA FILE [null] [E::hts_open_format] fail to open file 'outs/2762.bam_sort_addgroup_dedupped_sort.bam' samtools index: failed to open "outs/2762.bam_sort_addgroup_dedupped_sort.bam": No such file or directory INFO 14:52:08,771 HelpFormatter - ------------------------------------------------------------------------------------ INFO 14:52:08,772 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.8-1-0-gf15c1c3ef, Compiled 2018/02/19 05:43:50 INFO 14:52:08,773 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute INFO 14:52:08,773 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk INFO 14:52:08,773 HelpFormatter - [Tue Nov 01 14:52:08 HKT 2022] Executing on Linux 4.18.0-305.7.1.el8_4.x86_64 amd64 INFO 14:52:08,773 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_292-b10 INFO 14:52:08,775 HelpFormatter - Program Args: -T SplitNCigarReads -R GRCh38.primary_assembly.genome.fa -I outs/2762.bam_sort_addgroup_dedupped_sort.bam -o outs/2762.bam_sort_addgroup_dedupped_splitN.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS INFO 14:52:08,784 HelpFormatter - Executing as [email protected] on Linux 4.18.0-305.7.1.el8_4.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_292-b10. INFO 14:52:08,785 HelpFormatter - Date/Time: 2022/11/01 14:52:08 INFO 14:52:08,785 HelpFormatter - ------------------------------------------------------------------------------------ INFO 14:52:08,785 HelpFormatter - ------------------------------------------------------------------------------------ INFO 14:52:08,842 NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usersdata/joyzheng/GW15_Trachea/GW15-Trachea/outs/mycell/tbsp/gatk/GenomeAnalysisTK.jar!/com/intel/gkl/native/libgkl_compression.so INFO 14:52:08,863 GenomeAnalysisEngine - Deflater: IntelDeflater INFO 14:52:08,863 GenomeAnalysisEngine - Inflater: IntelInflater INFO 14:52:08,863 GenomeAnalysisEngine - Strictness is SILENT
ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.8-1-0-gf15c1c3ef):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://software.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: Fasta dict file /usersdata/joyzheng/GW15_Trachea/GW15-Trachea/outs/mycell/tbsp/GRCh38.primary_assembly.genome.dict for reference /usersdata/joyzheng/GW15_Trachea/GW15-Trachea/outs/mycell/tbsp/GRCh38.primary_assembly.genome.fa does not exist. Please see https://software.broadinstitute.org/gatk/documentation/article?id=1601 for help creating it.
ERROR ------------------------------------------------------------------------------------------
14:52:10.672 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usersdata/joyzheng/GW15_Trachea/GW15-Trachea/outs/mycell/tbsp/gatk/gatk-package-4.0.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so 14:52:10.982 INFO HaplotypeCaller - ------------------------------------------------------------ 14:52:10.983 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.0.3.0 14:52:10.983 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/ 14:52:10.983 INFO HaplotypeCaller - Executing as [email protected] on Linux v4.18.0-305.7.1.el8_4.x86_64 amd64 14:52:10.983 INFO HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_292-b10 14:52:10.984 INFO HaplotypeCaller - Start Date/Time: November 1, 2022 2:52:10 PM HKT 14:52:10.984 INFO HaplotypeCaller - ------------------------------------------------------------ 14:52:10.984 INFO HaplotypeCaller - ------------------------------------------------------------ 14:52:10.985 INFO HaplotypeCaller - HTSJDK Version: 2.14.3 14:52:10.985 INFO HaplotypeCaller - Picard Version: 2.17.2 14:52:10.985 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2 14:52:10.985 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 14:52:10.985 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 14:52:10.985 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 14:52:10.985 INFO HaplotypeCaller - Deflater: IntelDeflater 14:52:10.986 INFO HaplotypeCaller - Inflater: IntelInflater 14:52:10.986 INFO HaplotypeCaller - GCS max retries/reopens: 20 14:52:10.986 INFO HaplotypeCaller - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes 14:52:10.986 INFO HaplotypeCaller - Initializing engine 14:52:10.995 INFO HaplotypeCaller - Shutting down engine [November 1, 2022 2:52:10 PM HKT] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.01 minutes. Runtime.totalMemory()=2115502080
A USER ERROR has occurred: Fasta dict file file:///usersdata/joyzheng/GW15_Trachea/GW15-Trachea/outs/mycell/tbsp/GRCh38.primary_assembly.genome.dict for reference file:///usersdata/joyzheng/GW15_Trachea/GW15-Trachea/outs/mycell/tbsp/GRCh38.primary_assembly.genome.fa does not exist. Please see http://gatkforums.broadinstitute.org/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference for help creating it.
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace. INFO 14:52:12,178 HelpFormatter - ------------------------------------------------------------------------------------ INFO 14:52:12,179 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.8-1-0-gf15c1c3ef, Compiled 2018/02/19 05:43:50 INFO 14:52:12,179 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute INFO 14:52:12,180 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk INFO 14:52:12,180 HelpFormatter - [Tue Nov 01 14:52:12 HKT 2022] Executing on Linux 4.18.0-305.7.1.el8_4.x86_64 amd64 INFO 14:52:12,180 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_292-b10 INFO 14:52:12,182 HelpFormatter - Program Args: -T VariantFiltration -R GRCh38.primary_assembly.genome.fa -V outs/2762.bam_sort_addgroup_dedupped_splitN_baseRecab_Haplocaller.vcf -window 35 -cluster 3 -filterName FS -filter FS > 30.0 -filterName QD -filter QD < 2.0 -o outs/2762.vcf INFO 14:52:12,192 HelpFormatter - Executing as [email protected] on Linux 4.18.0-305.7.1.el8_4.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_292-b10. INFO 14:52:12,192 HelpFormatter - Date/Time: 2022/11/01 14:52:12 INFO 14:52:12,193 HelpFormatter - ------------------------------------------------------------------------------------ INFO 14:52:12,193 HelpFormatter - ------------------------------------------------------------------------------------
ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.8-1-0-gf15c1c3ef):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://software.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: Could not read file /usersdata/joyzheng/GW15_Trachea/GW15-Trachea/outs/mycell/tbsp/outs/2762.bam_sort_addgroup_dedupped_splitN_baseRecab_Haplocaller.vcf because file 'outs/2762.bam_sort_addgroup_dedupped_splitN_baseRecab_Haplocaller.vcf' does not exist
ERROR ------------------------------------------------------------------------------------------
`
Overall it seems like some problem with directory writing...
htsjdk.samtools.SAMException: Cannot read non-existent file: file:///usersdata/joyzheng/GW15_Trachea/GW15-Trachea/outs/mycell/tbsp/outs/2762.bam_sort.bam
Looks like there's an additional /
added before the directory?
@Jiayi-Zheng Hi sorry about the late reply. Could you send me the data so that I can take a look? My email is [email protected] If the bam file is too large, I can tell you how to upload it to our lab server.