gatk
gatk copied to clipboard
GenomicsDBImport datastore format folder permissions | cause for ERROR: Couldn't create GenomicsDBFeatureReader
Bug Report
Affected tool(s) or class(es)
GenomicsDBImport / GenotypeGVCFs
Affected version(s)
4.3.0.0
Description
When creating a GenomicsDB datastore, the created folder has permissions set to 700 (recursivelly). As such, when trying to jointly calling genotypes using the GenotypeGVCFs, one encounters error: ERROR: Couldn't create GenomicsDBFeatureReader
Steps to reproduce
-
Create a datastore using GenomicsDBImport, e.g. gatk ... --genomicsdb-workspace-path IWANNAKILLYOU
-
Recursively change access permission to the thus created genomicsdb chmod 700 -R ./IWANNAKILLYOU
-
Run the GenotypeGVCFs gatk ... --variant gendb://IWANNAKILLYOU
Expected behavior
GenotypeGVCFs should initialize the engine normally and start processing the intervals as expected
Actual behavior
GenotypeGVCFs intializes the engine and throws out and error ERROR: Couldn't create GenomicsDBFeatureReader
Proposed solution
Mention anywhere in the docs the genomicsdb datastore should be made readable to other users, i.e., change permissions to at least 744 if not do a 766. Or just make sure the ./IWANNAKILLYOU has proper permissions from the get go.
Much obliged
@vidprijatelj Thanks for the report! Can you check the UMASK value in your shell? You can do this by simply typing the command umask
. If it's set to something like 0077, that could explain what you're seeing.
GATK does not, in general, require permissions for users other than the owner of the file/directory, so it's a bit surprising that this is causing issues for you. Could you paste the full stacktrace for the exception you're getting? You may need to set GATK_STACKTRACE_ON_USER_EXCEPTION=true
in your environment in order to get GATK to print the stack trace.
@droazen Thanks for the reply!
Certainly. umask
returns 0022
. As such I reckon that is not the issue.
Stacktrace in the bottom.
The folder permission of the datastore folder is as follows:
drwx--S---+ 26 vidprijatelj group 4096 Mar 14 15:29 Vid_database
When changing to 766, the error disappears.
Tue Mar 14 15:37:57 CET 2023
Using GATK jar /appl/tools/versions/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Djava.io.tmpdir=zzz_tmpdir -Xmx128G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /appl/tools/versions/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar GenotypeGVCFs --reference /data/Scratch/References/ucsc.hg38.fa --variant gendb://Vid_database --output Step05_MultiSampleCalling/Vid.vcf.gz --intervals /data/Scratch/References/hg38_exome_v2.0.2_merged_probes_sorted_validated.annotated.bed --genomicsdb-shared-posixfs-optimizations True --merge-input-intervals
15:37:59.895 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/appl/tools/versions/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
15:38:00.018 INFO GenotypeGVCFs - ------------------------------------------------------------
15:38:00.018 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.3.0.0
15:38:00.018 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
15:38:00.018 INFO GenotypeGVCFs - Executing as user@server
15:38:00.018 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_362-b08
15:38:00.019 INFO GenotypeGVCFs - Start Date/Time: March 14, 2023 3:37:59 PM CET
15:38:00.019 INFO GenotypeGVCFs - ------------------------------------------------------------
15:38:00.019 INFO GenotypeGVCFs - ------------------------------------------------------------
15:38:00.019 INFO GenotypeGVCFs - HTSJDK Version: 3.0.1
15:38:00.019 INFO GenotypeGVCFs - Picard Version: 2.27.5
15:38:00.019 INFO GenotypeGVCFs - Built for Spark Version: 2.4.5
15:38:00.019 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:38:00.019 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:38:00.020 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:38:00.020 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:38:00.020 INFO GenotypeGVCFs - Deflater: IntelDeflater
15:38:00.020 INFO GenotypeGVCFs - Inflater: IntelInflater
15:38:00.020 INFO GenotypeGVCFs - GCS max retries/reopens: 20
15:38:00.020 INFO GenotypeGVCFs - Requester pays: disabled
15:38:00.020 INFO GenotypeGVCFs - Initializing engine
15:38:00.590 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.4.3-6069e4a
15:38:00.652 INFO GenotypeGVCFs - Shutting down engine
[March 14, 2023 3:38:00 PM CET] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2326265856
***********************************************************************
A USER ERROR has occurred: Couldn't create GenomicsDBFeatureReader
***********************************************************************
org.broadinstitute.hellbender.exceptions.UserException: Couldn't create GenomicsDBFeatureReader
at org.broadinstitute.hellbender.engine.FeatureDataSource.getGenomicsDBFeatureReader(FeatureDataSource.java:463)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:365)
at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:319)
at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:291)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.initializeDrivingVariants(VariantLocusWalker.java:76)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:67)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:726)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.onStartup(VariantLocusWalker.java:63)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: java.io.IOException: GenomicsDB JNI Error: vector::_M_default_append
at org.genomicsdb.reader.GenomicsDBQueryStream.jniGenomicsDBInit(Native Method)
at org.genomicsdb.reader.GenomicsDBQueryStream.<init>(GenomicsDBQueryStream.java:209)
at org.genomicsdb.reader.GenomicsDBQueryStream.<init>(GenomicsDBQueryStream.java:182)
at org.genomicsdb.reader.GenomicsDBQueryStream.<init>(GenomicsDBQueryStream.java:91)
at org.genomicsdb.reader.GenomicsDBFeatureReader.generateHeadersForQuery(GenomicsDBFeatureReader.java:200)
at org.genomicsdb.reader.GenomicsDBFeatureReader.<init>(GenomicsDBFeatureReader.java:85)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getGenomicsDBFeatureReader(FeatureDataSource.java:460)
... 13 more
@nalinigans / @mlathara , any insight into this GenomicsDB JNI Error: vector::_M_default_append
error that apparently is related to permissions on the GenomicsDB directory?
With --genomicsdb-shared-posixfs-optimizations
, the storage system should only require read access. @droazen, will work towards a fix for this.
@vidprijatelj , I can't reproduce the issue on MacOS
and Centos 7
. Can you provide us with more information with respect to the system you are on? What is the OS? Are there any access control lists setup?
@nalinigans
me@server:~$ cat /etc/os-release
NAME="CentOS Stream"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Stream 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream"
me@server:/data/Scratch/Exo-Seq/221108_PracticeVid$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 2
NUMA node(s): 8
Vendor ID: AuthenticAMD
CPU family: 23
Model: 1
Model name: AMD EPYC 7601 32-Core Processor
SNIPPED BELOW TLDR
The project dir
me@server:/data/Scratch/Exo-Seq/221108_PracticeVid$ getfacl .
# file: .
# owner: me
# group: groupours
# flags: -s-
user::rwx
group::rwx
other::rwx
GenomicsDB dir // do note I changed permissions as outlined up top.
me@server:/data/Scratch/Exo-Seq/221108_PracticeVid$ getfacl -d ./Vid_database/
# file: Vid_database/
# owner: me
# group: ourgroup
# flags: -s-
Hopefully this helps.
Thanks @vidprijatelj. I see the sticky bit being used for groups for the workspace - # flags: -s-
. That, by itself, seems to be OK, that is I am not able to reproduce the issue. But it looks like std::vector is not able to resize - Caused by: java.io.IOException: GenomicsDB JNI Error: vector::_M_default_append
. What are the permissions to your tmp directory? Does it also have the sticky bit set? Even if the workspace only requires read permissions, GenomicsDB and probably the underlying standard C++ runtime may require write access to tmp and the sticky bit may be affecting the execution.
Also, can you please confirm that the user creating the workspace and the user reading from the workspace are the same?
@nalinigans Hi, apologies for the late reply. The temp dir has the sticky bit as well. Permissions are expanded compared to the workspace - below is the default output without me playing around or changing anything.
me@server:/data/Scratch/Exo-Seq/221108_PracticeVid$ getfacl ./zzz_tmpdir/
# file: zzz_tmpdir/
# owner: me
# group: groupours
# flags: -s-
user::rwx
group::rwx
other::rwx
The user creating the workspace and the user reading from it are identical.
Hi all, Is this problem solved yet? I have the same error "A USER ERROR has occurred: Couldn't create GenomicsDBFeatureReader".
@CHENG-KH, are you having GenomicsDBImport datastore format folder permissions
as well? Can you follow https://github.com/broadinstitute/gatk/issues/8233#issuecomment-1466807447 and attach the stack trace please?
@nalinigans Hi, apologies for the late reply.
A USER ERROR has occurred: Couldn't create GenomicsDBFeatureReader
org.broadinstitute.hellbender.exceptions.UserException: Couldn't create GenomicsDBFeatureReader
at org.broadinstitute.hellbender.engine.FeatureDataSource.getGenomicsDBFeatureReader(FeatureDataSource.java:463)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:365)
at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:319)
at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:291)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.initializeDrivingVariants(VariantLocusWalker.java:76)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:67)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:726)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.onStartup(VariantLocusWalker.java:63)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: java.io.IOException: GenomicsDB JNI Error: Broad combine GVCFs exception : No sample/CallSet name specified in JSON file/Protobuf object for TileDB row 1381
at org.genomicsdb.reader.GenomicsDBQueryStream.jniGenomicsDBInit(Native Method)
at org.genomicsdb.reader.GenomicsDBQueryStream.<init>(GenomicsDBQueryStream.java:209)
at org.genomicsdb.reader.GenomicsDBQueryStream.<init>(GenomicsDBQueryStream.java:182)
at org.genomicsdb.reader.GenomicsDBQueryStream.<init>(GenomicsDBQueryStream.java:91)
at org.genomicsdb.reader.GenomicsDBFeatureReader.generateHeadersForQuery(GenomicsDBFeatureReader.java:200)
at org.genomicsdb.reader.GenomicsDBFeatureReader.<init>(GenomicsDBFeatureReader.java:85)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getGenomicsDBFeatureReader(FeatureDataSource.java:460)
Any update on this? I just ran into a form of this problem in the context of some pipeline unit tests. I have a task that runs the following:
gatk GenomicsDBImport \
--sample-name-map ${sample_map} \
--genomicsdb-workspace-path ${cohort_name}_gdb \
--genomicsdb-shared-posixfs-optimizations \
-L ${interval_list}
gatk GenotypeGVCFs \
-R ${ref_fasta} \
-V gendb://${cohort_name}_gdb \
-O ${cohort_name}.joint.vcf \
-L ${interval_list}
Which runs fine, but if I re-run the test suite the system complains it can't delete the gdb workspace. I have to manually sudo rm
which is gross. I can work around this by adding either chmod 777 -R ${cohort_name}_gdb
or rm -r ${cohort_name}_gdb
as a cleanup step, but that seems gross too.
My use case is just a toy example for training purposes, but I worry about what this could mean for a production environment.
Am I missing something?
Is the task using docker as execution environment? If so how is the user and group set for that?
Yes it is. Honestly not sure on the u/g config, as an end user I'd really rather not have to care about that 😅 This is the only tool causing this kind of issue so it's got to be the tool itself, no?
I believe I've had this issue before but with different tools as well. If you are on nextflow below is a config for scope docker
docker.fixOwnership
Fix ownership of files created by the docker container.
There is also another scope that could be set if there is only a single user
docker.runOptions
This attribute can be used to provide any extra command line options supported by the docker run command. See the [Docker documentation](https://docs.docker.com/engine/reference/run/) for details.
This one enables passing -u parameter to docker directly.
If none of them are set in the nextflow config then I would first suggest these options. If not we can escalate this with the team.
Oh interesting, thank you. Yes this is a nextflow pipeline. Thanks for the tip! Will report back.
That worked! TIL. Thank you very much!!