bwa-mem2 icon indicating copy to clipboard operation
bwa-mem2 copied to clipboard

Compatibility with gatk4 MarkDuplicates

Open guandailu opened this issue 2 years ago • 1 comments

My bwa command: bwa-mem2 mem -M -t 24 -p -R '@RG\tID:SAMN10471711\tSM:SAMN10471711\tLB:SAMN10471711\tPL:ILLUMINA' /group/zhougrp/zhoulab/GENOMES/Chicken/EMBL/version107/GRCg7b/Gallus_gallus.bGalGal1.mat.broiler.GRCg7b.dna_sm.toplevel.chr.fa 04_trimmed_reads/SAMN10471711_val_1.fq.gz 04_trimmed_reads/SAMN10471711_val_2.fq.gz | samtools view -bS - > 05_aligned_reads/SAMN10471711.aligned.bam

MarkDuplicates command: gatk --java-options "-Xmx24G -XX:ParallelGCThreads=12 -Djava.io.tmpdir=Temp" MarkDuplicates -I 05_aligned_reads/SAMN10471711.aligned.bam -M 07_dedup_bam/SAMN10471711.metrics.txt -O 07_dedup_bam/SAMN10471711.dedup.bam --COMPRESSION_LEVEL 9 --CREATE_INDEX true --REMOVE_DUPLICATES true --TMP_DIR ./Temp

Error infor: gatk --java-options "-Xmx24G -XX:ParallelGCThreads=12 -Djava.io.tmpdir=Temp" MarkDuplicates -I 05_aligned_reads/SAMN10471711.aligned.bam -M 07_dedup_bam/SAMN10471711.metrics.txt -O 07_dedup_bam/SAMN10471711.dedup.bam --COMPRESSION_LEVEL 9 --CREATE_INDEX true --REMOVE_DUPLICATES true --TMP_DIR ./Temp Using GATK jar /home/dguan/anaconda3/envs/gatk4/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx24G -XX:ParallelGCThreads=12 -Djava.io.tmpdir=Temp -jar /home/dguan/anaconda3/envs/gatk4/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar MarkDuplicates -I 05_aligned_reads/SAMN10471711.aligned.bam -M 07_dedup_bam/SAMN10471711.metrics.txt -O 07_dedup_bam/SAMN10471711.dedup.bam --COMPRESSION_LEVEL 9 --CREATE_INDEX true --REMOVE_DUPLICATES true --TMP_DIR ./Temp 10:18:49.097 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/dguan/anaconda3/envs/gatk4/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so [Tue Aug 23 10:18:49 PDT 2022] MarkDuplicates --INPUT 05_aligned_reads/SAMN10471711.aligned.bam --OUTPUT 07_dedup_bam/SAMN10471711.dedup.bam --METRICS_FILE 07_dedup_bam/SAMN10471711.metrics.txt --REMOVE_DUPLICATES true --TMP_DIR ./Temp --COMPRESSION_LEVEL 9 --CREATE_INDEX true --MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP 50000 --MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 8000 --SORTING_COLLECTION_SIZE_RATIO 0.25 --TAG_DUPLICATE_SET_MEMBERS false --REMOVE_SEQUENCING_DUPLICATES false --TAGGING_POLICY DontTag --CLEAR_DT true --DUPLEX_UMI false --ADD_PG_TAG_TO_READS true --ASSUME_SORTED false --DUPLICATE_SCORING_STRATEGY SUM_OF_BASE_QUALITIES --PROGRAM_RECORD_ID MarkDuplicates --PROGRAM_GROUP_NAME MarkDuplicates --READ_NAME_REGEX <optimized capture of last three ':' separated fields as numeric values> --OPTICAL_DUPLICATE_PIXEL_DISTANCE 100 --MAX_OPTICAL_DUPLICATE_SET_SIZE 300000 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --MAX_RECORDS_IN_RAM 500000 --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false Aug 23, 2022 10:18:49 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine INFO: Failed to detect whether we are running on Google Compute Engine. [Tue Aug 23 10:18:49 PDT 2022] Executing as dguan@c8-63 on Linux 5.4.0-124-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_152-release-1056-b12; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.2.0.0 INFO 2022-08-23 10:18:49 MarkDuplicates Start of doWork freeMemory: 1217306456; totalMemory: 1237319680; maxMemory: 22906667008 INFO 2022-08-23 10:18:49 MarkDuplicates Reading input file and constructing read end information. INFO 2022-08-23 10:18:49 MarkDuplicates Will retain up to 82995170 data points before spilling to disk. [Tue Aug 23 10:18:49 PDT 2022] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.01 minutes. Runtime.totalMemory()=1237319680 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp htsjdk.samtools.SAMFormatException: Error parsing SAM header. Problem parsing @PG key:value pair ID:SAMN10471711 clashes with ID:bwa-mem2. Line: @PG ID:bwa-mem2 PN:bwa-mem2 VN:2.2.1 CL:bwa-mem2 mem -M -t 12 -p -R @RG ID:SAMN10471711 SM:SAMN10471711 LB:SAMN10471711 PL:ILLUMINA /group/zhougrp/zhoulab/GENOMES/Chicken/EMBL/version107/GRCg7b/Gallus_gallus.bGalGal1.mat.broiler.GRCg7b.dna_sm.toplevel.chr.fa 04_trimmed_reads/SAMN10471711_val_1.fq.gz 04_trimmed_reads/SAMN10471711_val_2.fq.gz; File /group/zhougrp2/dguan/ChickenSV/05_aligned_reads/SAMN10471711.aligned.bam; Line number 44 at htsjdk.samtools.SAMTextHeaderCodec.reportErrorParsingLine(SAMTextHeaderCodec.java:258) at htsjdk.samtools.SAMTextHeaderCodec.access$200(SAMTextHeaderCodec.java:46) at htsjdk.samtools.SAMTextHeaderCodec$ParsedHeaderLine.(SAMTextHeaderCodec.java:313) at htsjdk.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:97) at htsjdk.samtools.BAMFileReader.readHeader(BAMFileReader.java:704) at htsjdk.samtools.BAMFileReader.(BAMFileReader.java:298) at htsjdk.samtools.BAMFileReader.(BAMFileReader.java:176) at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:406) at picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram.openInputs(AbstractMarkDuplicatesCommandLineProgram.java:262) at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:508) at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:257) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308) at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:37) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289)

guandailu avatar Aug 23 '22 17:08 guandailu

Not the author, but you could try to use @RG instead of @rg in your alignment command. Also try to enter different different information for LB: field. Ex. LB:Library_1

kvn95ss avatar Nov 26 '22 05:11 kvn95ss