fgbio icon indicating copy to clipboard operation
fgbio copied to clipboard

GroupReadsByUmi Error java.nio.file.NoSuchFileException: /dev/stdin

Open Emmalynchen opened this issue 5 years ago • 8 comments

I am using fgbio Version: 0.9.0-0dda145-SNAPSHOT to run GroupReadsByUmi on a sorted, mapped BAM with duplex RX tags and am running into the following error:

...
[2019/09/10 15:04:29 | GroupReadsByUmi | Info] Sorted    32,000,000 records.  Elapsed time: 00:07:25s.  Time for last 1,000,000:   14s.  Last read position: X:73,269,181
[2019/09/10 15:04:38 | GroupReadsByUmi | Info] Accepted 32,618,100 reads for grouping.
[2019/09/10 15:04:38 | GroupReadsByUmi | Info] Filtered out 3,782,104 reads that were not part of a high confidence mapped read pair.
[2019/09/10 15:04:38 | GroupReadsByUmi | Info] Filtered out 36,554 reads that contained one or more Ns in their UMIs.
[2019/09/10 15:04:38 | GroupReadsByUmi | Info] Assigning reads to UMIs and outputting.
[2019/09/10 15:04:40 | FgBioMain | Info] GroupReadsByUmi failed. Elapsed time: 7.69 minutes.
Exception in thread "main" java.nio.file.NoSuchFileException: /dev/stdin
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
	at sun.nio.fs.UnixFileSystemProvider.isSameFile(UnixFileSystemProvider.java:338)
	at java.nio.file.Files.isSameFile(Files.java:1504)
	at com.fulcrumgenomics.commons.io.IoUtil.toInputStream(Io.scala:51)
	at com.fulcrumgenomics.commons.io.IoUtil.toInputStream$(Io.scala:50)
	at com.fulcrumgenomics.util.Io.toInputStream(Io.scala:48)
	at com.fulcrumgenomics.util.Sorter.$anonfun$iterator$1(Sorter.scala:219)
	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike.map(TraversableLike.scala:237)
	at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
	at scala.collection.AbstractTraversable.map(Traversable.scala:108)
	at com.fulcrumgenomics.util.Sorter.iterator(Sorter.scala:219)
	at com.fulcrumgenomics.umi.GroupReadsByUmi.execute(GroupReadsByUmi.scala:479)
	at com.fulcrumgenomics.cmdline.FgBioMain.makeItSo(FgBioMain.scala:141)
	at com.fulcrumgenomics.cmdline.FgBioMain.makeItSoAndExit(FgBioMain.scala:117)
	at com.fulcrumgenomics.cmdline.FgBioMain$.main(FgBioMain.scala:82)
	at com.fulcrumgenomics.cmdline.FgBioMain.main(FgBioMain.scala)

Here is the command:

java -Xms3g -Xmx4g -Djava.io.tmpdir=/scratch/sf181750 -jar /wittelab/data2/software/bin/fgbio.jar GroupReadsByUmi \
-e 1 \
--raw-tag RX \
-i S083-cfDNA-merged.bam \
-o S083-cfDNA-groupbyumi.bam \
--strategy paired \
--family-size-histogram /wittelab/data2/emmalyn/results/panel-cfdna/fgbio-workflow/S083-output/S083-cfDNA-groupbyumi-histogram

Two reads from the input bam containing two hyphen separated UMIs in the RX tag:

A00269:70:HCHNFDMXX:2:1285:4562:25551	99	1	10078	0	33M100S	=	10166	124
CTAACCCTAACCCTAACCCTAACCCTAACCCAAACACTAACCATATCCCTAACCAAAAACATAAAGCAAAAAACAACACTATACATAACGCACGTCTAAAAAGTATATAAATTATAAGGACAAGGCATTAGTA
,,:F,::FFF:,:,FF,F:FFFF,,F,,,F,,,,,,FF,,,,,FF,,FF,::F,,,F,,:,FF,F,,,::,::,F,,,F,,:FFF,,,F,,,,,,,,,,:,F,F,,FF::,,:,,,,:,,,,,,,,,:,,,,,	
MC:Z:96S9M1I27M	MD:Z:33	PG:Z:bwa	RG:Z:A	NM:i:0	MQ:i:0	UQ:i:0	AS:i:33	RX:Z:CTCGTT-ATACTT

A00269:70:HCHNFDMXX:2:1285:4562:25551	147	1	10166	0	96S9M1I27M	=	10078	-124
CCTCAAGACCCCAACCATTTCATTACCCTGCTGCTTCCCCTCGTTCCTACCAATCCGTTATACGAATATATTGATTAGATGATCATCCAATCATATCCCTAACCCTTAACCTAACCCTACCCCTAACCCTAAC
,:,,,,,F,,F:,,,:::,F,,,,,,,,F,,F,,:,,:,,F,,,:,,,,,,,,,,F,,F,F,,,:,F,,,,,,,::,F,,,,F,F,,,,F,,F,:,,:,:::,:,F,F,:FFF,F,:FF,F,:F,F,:F:F:F
MC:Z:33M100S	MD:Z:22A13	PG:Z:bwa	RG:Z:A	NM:i:2	MQ:i:0	UQ:i:11	AS:i:24	RX:Z:CTCGTT-ATACTT

Here are the upstream processes:

  1. I started with a pair of R1/R2.fastq.gz files that have two UMIs and generated an unmapped BAM with RX tags using fgbio FastqToBam and --read-structures 6M11S+T 6M11S+T based on the library prep image
  2. Sorted the unmapped BAM with fgbio SortSam by queryname
  3. Generated a mapped BAM with RX tag by
    • reverting back to two fastq files with gatk SamToFastq
    • indexing fasta with bwa index -a bwtsw
    • aligning both fastq files with bwa mem
    • and merging the mapped and unmapped bams with gatk MergeBamAlignment to preserve RX tags

Not sure if this is relevant, but I'm using Nextflow to run the pipeline on a cluster, and am setting vmem=150G and scratch = 300G. Is GroupReadsByUmi doing multi-threading? Wondering if I need to think about setting java -jar -XX:ParallelGCThreads or nodes=3:ppn=2 to manage garbage collection and limit the number of GC threads so they aren't competing with the main process.

Emmalynchen avatar Sep 11 '19 16:09 Emmalynchen

@Emmalynchen

  1. What operating system are you running this on? It's really odd that /dev/stdin doesn't exist. Maybe you're running MinGW on Windows which doesn't have /dev?

  2. Less likely, but does /scratch/sf181750 not exist? You're setting: -Djava.io.tmpdir=/scratch/sf181750 and since the stack trace threads back through Sorter, where it is trying to open a temporary file, I wonder if the temporary directory doesn't exist is the issue?

nh13 avatar Sep 16 '19 23:09 nh13

  1. I'm running this on a Linux compute cluster which should have /dev
LSB Version:	:base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID:	RedHatEnterpriseServer
Description:	Red Hat Enterprise Linux Server release 6.6 (Santiago)
Release:	6.6
Codename:	Santiago
  1. /scratch/sf181750 exists on all the nodes that my jobs are submitted to. I removed -Djava.io.tmpdir=/scratch/sf181750 allowing the temporary files to be generated at a default(?) location, but ran into the same error. Is it unable to find the sorted temporary file?

Is it possible to skip sorting at the GroupReadsByUmi step since I sort the aligned BAM before merging?

Emmalynchen avatar Sep 17 '19 02:09 Emmalynchen

No, the tool will add MI tags which are needed for the consensus caller. There’s definitely something strange going on. Paging @tfenne and @jacarey

nh13 avatar Sep 17 '19 04:09 nh13

Ah, I forgot about the MI tags. I wanted to check what temp files were in /scratch/sf181750 when the tool is running, and it looks like they are at least being generated during the sorter step:

libgkl_compression765515595767538568.so
snappy-1.1.4-a501da58-f44b-413a-9c74-dc59b9355d86-libsnappyjava.so
sorter.192749154984195478.tmp
sorter.2356060981973793409.tmp
sorter.3003506415139729124.tmp
sorter.3106057423944075150.tmp
sorter.4298964366227425557.tmp
sorter.4836110113879852156.tmp
sorter.5359483764166645815.tmp
sorter.6737433063849702130.tmp
sorter.7324846156820917057.tmp
sorter.7377516346410739602.tmp
sorter.7451897835714716618.tmp
sorter.7640727467896049814.tmp
sorter.7895096692907105099.tmp
sorter.7945726512341326355.tmp
sorter.8073855302226253429.tmp
sorter.8213919087908878618.tmp
sorter.8581409292089938562.tmp
sorter.8590045759821848752.tmp
sorter.9191781062006182509.tmp

Emmalynchen avatar Sep 17 '19 17:09 Emmalynchen

Hi @Emmalynchen. Can you let me know what shell you are using? There is a check that we do when creating an inputstream to determine if the path you are passing us is standard in. For some reason when we ask the OS to run fstatat on /dev/stdin it is telling us that it doesn't exist. I'd like to set up a node using the same OS and shell to see if I can reproduce it. Thanks!

jacarey avatar Sep 17 '19 18:09 jacarey

Hi @jacarey, I'm using bash. Thank you for looking into this!

$ echo $0
-bash
$ echo ${BASH_VERSION}
4.1.2(1)-release
$ bash --version
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

Emmalynchen avatar Sep 17 '19 18:09 Emmalynchen

Following up to see if there are any suggestions to help address the error I'm seeing.

Emmalynchen avatar Sep 26 '19 23:09 Emmalynchen

Hey @Emmalynchen we are currently prioritizing client work, but this is the list of issues to get done. We appreciate your patience.

nh13 avatar Sep 27 '19 16:09 nh13