fgbio
fgbio copied to clipboard
GroupReadsByUmi Error java.nio.file.NoSuchFileException: /dev/stdin
I am using fgbio Version: 0.9.0-0dda145-SNAPSHOT
to run GroupReadsByUmi
on a sorted, mapped BAM with duplex RX tags and am running into the following error:
...
[2019/09/10 15:04:29 | GroupReadsByUmi | Info] Sorted 32,000,000 records. Elapsed time: 00:07:25s. Time for last 1,000,000: 14s. Last read position: X:73,269,181
[2019/09/10 15:04:38 | GroupReadsByUmi | Info] Accepted 32,618,100 reads for grouping.
[2019/09/10 15:04:38 | GroupReadsByUmi | Info] Filtered out 3,782,104 reads that were not part of a high confidence mapped read pair.
[2019/09/10 15:04:38 | GroupReadsByUmi | Info] Filtered out 36,554 reads that contained one or more Ns in their UMIs.
[2019/09/10 15:04:38 | GroupReadsByUmi | Info] Assigning reads to UMIs and outputting.
[2019/09/10 15:04:40 | FgBioMain | Info] GroupReadsByUmi failed. Elapsed time: 7.69 minutes.
Exception in thread "main" java.nio.file.NoSuchFileException: /dev/stdin
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.isSameFile(UnixFileSystemProvider.java:338)
at java.nio.file.Files.isSameFile(Files.java:1504)
at com.fulcrumgenomics.commons.io.IoUtil.toInputStream(Io.scala:51)
at com.fulcrumgenomics.commons.io.IoUtil.toInputStream$(Io.scala:50)
at com.fulcrumgenomics.util.Io.toInputStream(Io.scala:48)
at com.fulcrumgenomics.util.Sorter.$anonfun$iterator$1(Sorter.scala:219)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at scala.collection.TraversableLike.map(TraversableLike.scala:237)
at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at com.fulcrumgenomics.util.Sorter.iterator(Sorter.scala:219)
at com.fulcrumgenomics.umi.GroupReadsByUmi.execute(GroupReadsByUmi.scala:479)
at com.fulcrumgenomics.cmdline.FgBioMain.makeItSo(FgBioMain.scala:141)
at com.fulcrumgenomics.cmdline.FgBioMain.makeItSoAndExit(FgBioMain.scala:117)
at com.fulcrumgenomics.cmdline.FgBioMain$.main(FgBioMain.scala:82)
at com.fulcrumgenomics.cmdline.FgBioMain.main(FgBioMain.scala)
Here is the command:
java -Xms3g -Xmx4g -Djava.io.tmpdir=/scratch/sf181750 -jar /wittelab/data2/software/bin/fgbio.jar GroupReadsByUmi \
-e 1 \
--raw-tag RX \
-i S083-cfDNA-merged.bam \
-o S083-cfDNA-groupbyumi.bam \
--strategy paired \
--family-size-histogram /wittelab/data2/emmalyn/results/panel-cfdna/fgbio-workflow/S083-output/S083-cfDNA-groupbyumi-histogram
Two reads from the input bam containing two hyphen separated UMIs in the RX tag:
A00269:70:HCHNFDMXX:2:1285:4562:25551 99 1 10078 0 33M100S = 10166 124
CTAACCCTAACCCTAACCCTAACCCTAACCCAAACACTAACCATATCCCTAACCAAAAACATAAAGCAAAAAACAACACTATACATAACGCACGTCTAAAAAGTATATAAATTATAAGGACAAGGCATTAGTA
,,:F,::FFF:,:,FF,F:FFFF,,F,,,F,,,,,,FF,,,,,FF,,FF,::F,,,F,,:,FF,F,,,::,::,F,,,F,,:FFF,,,F,,,,,,,,,,:,F,F,,FF::,,:,,,,:,,,,,,,,,:,,,,,
MC:Z:96S9M1I27M MD:Z:33 PG:Z:bwa RG:Z:A NM:i:0 MQ:i:0 UQ:i:0 AS:i:33 RX:Z:CTCGTT-ATACTT
A00269:70:HCHNFDMXX:2:1285:4562:25551 147 1 10166 0 96S9M1I27M = 10078 -124
CCTCAAGACCCCAACCATTTCATTACCCTGCTGCTTCCCCTCGTTCCTACCAATCCGTTATACGAATATATTGATTAGATGATCATCCAATCATATCCCTAACCCTTAACCTAACCCTACCCCTAACCCTAAC
,:,,,,,F,,F:,,,:::,F,,,,,,,,F,,F,,:,,:,,F,,,:,,,,,,,,,,F,,F,F,,,:,F,,,,,,,::,F,,,,F,F,,,,F,,F,:,,:,:::,:,F,F,:FFF,F,:FF,F,:F,F,:F:F:F
MC:Z:33M100S MD:Z:22A13 PG:Z:bwa RG:Z:A NM:i:2 MQ:i:0 UQ:i:11 AS:i:24 RX:Z:CTCGTT-ATACTT
Here are the upstream processes:
- I started with a pair of
R1/R2.fastq.gz
files that have two UMIs and generated an unmapped BAM with RX tags usingfgbio FastqToBam
and--read-structures 6M11S+T 6M11S+T
based on the library prep - Sorted the unmapped BAM with
fgbio SortSam
by queryname - Generated a mapped BAM with RX tag by
- reverting back to two fastq files with
gatk SamToFastq
- indexing fasta with
bwa index -a bwtsw
- aligning both fastq files with
bwa mem
- and merging the mapped and unmapped bams with
gatk MergeBamAlignment
to preserve RX tags
- reverting back to two fastq files with
Not sure if this is relevant, but I'm using Nextflow to run the pipeline on a cluster, and am setting vmem=150G
and scratch = 300G
. Is GroupReadsByUmi
doing multi-threading? Wondering if I need to think about setting java -jar -XX:ParallelGCThreads
or nodes=3:ppn=2
to manage garbage collection and limit the number of GC threads so they aren't competing with the main process.
@Emmalynchen
-
What operating system are you running this on? It's really odd that
/dev/stdin
doesn't exist. Maybe you're runningMinGW
on Windows which doesn't have/dev
? -
Less likely, but does
/scratch/sf181750
not exist? You're setting:-Djava.io.tmpdir=/scratch/sf181750
and since the stack trace threads back throughSorter
, where it is trying to open a temporary file, I wonder if the temporary directory doesn't exist is the issue?
- I'm running this on a Linux compute cluster which should have
/dev
LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 6.6 (Santiago)
Release: 6.6
Codename: Santiago
-
/scratch/sf181750
exists on all the nodes that my jobs are submitted to. I removed-Djava.io.tmpdir=/scratch/sf181750
allowing the temporary files to be generated at a default(?) location, but ran into the same error. Is it unable to find the sorted temporary file?
Is it possible to skip sorting at the GroupReadsByUmi
step since I sort the aligned BAM before merging?
No, the tool will add MI tags which are needed for the consensus caller. There’s definitely something strange going on. Paging @tfenne and @jacarey
Ah, I forgot about the MI tags. I wanted to check what temp files were in /scratch/sf181750
when the tool is running, and it looks like they are at least being generated during the sorter step:
libgkl_compression765515595767538568.so
snappy-1.1.4-a501da58-f44b-413a-9c74-dc59b9355d86-libsnappyjava.so
sorter.192749154984195478.tmp
sorter.2356060981973793409.tmp
sorter.3003506415139729124.tmp
sorter.3106057423944075150.tmp
sorter.4298964366227425557.tmp
sorter.4836110113879852156.tmp
sorter.5359483764166645815.tmp
sorter.6737433063849702130.tmp
sorter.7324846156820917057.tmp
sorter.7377516346410739602.tmp
sorter.7451897835714716618.tmp
sorter.7640727467896049814.tmp
sorter.7895096692907105099.tmp
sorter.7945726512341326355.tmp
sorter.8073855302226253429.tmp
sorter.8213919087908878618.tmp
sorter.8581409292089938562.tmp
sorter.8590045759821848752.tmp
sorter.9191781062006182509.tmp
Hi @Emmalynchen. Can you let me know what shell you are using? There is a check that we do when creating an inputstream to determine if the path you are passing us is standard in. For some reason when we ask the OS to run fstatat on /dev/stdin it is telling us that it doesn't exist. I'd like to set up a node using the same OS and shell to see if I can reproduce it. Thanks!
Hi @jacarey, I'm using bash
. Thank you for looking into this!
$ echo $0
-bash
$ echo ${BASH_VERSION}
4.1.2(1)-release
$ bash --version
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
Following up to see if there are any suggestions to help address the error I'm seeing.
Hey @Emmalynchen we are currently prioritizing client work, but this is the list of issues to get done. We appreciate your patience.