AnnotSV
AnnotSV copied to clipboard
Conda (and container) version of AnnotSV runs into error when run with -hpo arg
When I ran AnnotSV using Singularity, it runs into error couldn't open "/usr/local/etc/AnnotSV/application.properties": no such file or directory
. I manually checked the container and the file application.properties
is indeed missing. I suspect this affects docker container as well as this file is missing in the docker image as well.
Here is what I have debugged so far:
Singularity
- Container used
docker://quay.io/biocontainers/annotsv:3.3.4--py311hdfd78af_1
- Singularity version 3.5.2. Ran in HPC.
- Ran AnnotSV without -hpo, and the run was successful
singularity exec --bind $ANNOTATIONS_DIR:/annotations/ $CONTAINER AnnotSV -SVinputFile data/test.bed \
> -outputFile ./data/test.annotated.tsv \
> -svtBEDcol 4 \
> -annotationsDir /annotations/
Click for stdout/stderr
AnnotSV 3.3.4Copyright (C) 2017-2023 GEOFFROY Veronique
Please feel free to contact me for any suggestions or bug reports email: [email protected]
Tcl/Tk version: 8.6
Application name used: /usr/local
...downloading the configuration data (September 06 2023 - 14:08) ...configuration data by default ...configuration data from /usr/local/etc/AnnotSV/configfile ...configuration data given in arguments ...checking all these configuration data
...checking the annotation data sources (September 06 2023 - 14:08)
WARNING: No GeneHancer annotations available. (Please, see in the README file how to add these annotations. Users need to contact the GeneCards team.)
...listing arguments ****************************************** AnnotSV has been run with these arguments: ****************************************** -REreport 0 -REselect1 1 -REselect2 1 -SVinputFile data/test.bed -SVinputInfo 1 -SVminSize 50 -annotationMode both -annotationsDir /annotations -bcftools bcftools -bedtools bedtools -benignAF 0.01 -candidateGenesFiltering 0 -cytoband 1 -genomeBuild GRCh38 -includeCI 1 -metrics us -miRNAann 1 -minTotalNumber 500 -organism Human -outputDir ./data -outputFile test.annotated.tsv -overlap 100 -overwrite 1 -promoterSize 500 -rankFiltering 1 2 3 4 5 NA -reciprocal 0 -samplesidBEDcol 7 -snvIndelPASS 0 -svtBEDcol 4 -tx RefSeq -variantconvertDir /usr/local/share/python3/variantconvert/ -vcf 0 ******************************************
...searching for SV overlaps with a gene or a regulatory elements ...461 genes overlapped with an SV ...3773 genes regulated by a regulatory element which is overlapped with an SV
...listing of the annotations to be realized (September 06 2023 - 14:08) ...CytoBand annotation ...Genes annotation ...RefSeq annotation ...Regulatory elements annotations ...Promoter annotations ...EnhancerAtlas annotations ...Annotations with pathogenic genes or genomic regions ...dbVar annotation ...ClinVar annotation ...ClinGen annotation ...Annotations with pathogenic snv/indel ...Annotations with benign genes or genomic regions ...gnomAD annotation ...ClinVar annotation ...ClinGen annotation ...DGV annotation ...DDD annotation ...1000g annotation ...Ira M. Hall's lab annotation ...Children’s Mercy Research Institute ...Annotations with features overlapped with the SV (100 %) ...TAD annotation ...Annotations with features sharing any overlap with the SV ...Breakpoints annotations ...GC content annotation ...Repeat annotation ...Gap annotation ...Segmental duplication annotation ...ENCODE blacklist annotation ...Gene-based annotations ...20220617_ACMG.tsv (78 gene identifiers and 1 annotations columns: ACMG) ...20220906_ClinGenAnnotations.tsv (1480 gene identifiers and 2 annotations columns: HI, TS) ...20200713_HI.tsv.gz (19124 gene identifiers and 1 annotations columns: DDD_HI_percent) ...20191219_ExAC.CNV-Zscore.annotations.tsv.gz (15673 gene identifiers and 3 annotations columns: ExAC_delZ, ExAC_dupZ, ExAC_cnvZ) ...20201023_GeneIntolerance-Zscore.annotations.tsv.gz (18241 gene identifiers and 2 annotations columns: ExAC_synZ, ExAC_misZ) ...20220902_GenCC.tsv (4615 gene identifiers and 4 annotations columns: GenCC_disease, GenCC_moi, GenCC_classification, GenCC_pmid) ...20220905_OMIM-1-annotations.tsv.gz (16250 gene identifiers and 1 annotations columns: OMIM_ID) ...20220905_OMIM-2-annotations.tsv.gz (16250 gene identifiers and 2 annotations columns: OMIM_phenotype, OMIM_inheritance) ...20220905_morbid.tsv.gz (12998 gene identifiers and 1 annotations columns: OMIM_morbid) ...20220905_morbidCandidate.tsv.gz (3467 gene identifiers and 1 annotations columns: OMIM_morbid_candidate) ...20201106_gnomAD.LOEUF.pLI.annotations.tsv.gz (19451 gene identifiers and 3 annotations columns: LOEUF_bin, GnomAD_pLI, ExAC_pLI)
...annotation in progress (September 06 2023 - 14:08) -- GCcontentAnnotation, nuc -- bedtools nuc -fi /annotations/Annotations_Human/BreakpointsAnnotations/GCcontent/GRCh38/GRCh38_chromFa.fasta -bed ./data/test.NA.formatted.sorted.breakpoints.bed > ./data/test.NA.formatted.sorted.GCcontent.txt Feature (14:107151992-107152192) beyond the length of 14 size (107043718 bp). Skipping. Feature (14:107179995-107180195) beyond the length of 14 size (107043718 bp). Skipping. Feature (2:242865820-242866020) beyond the length of 2 size (242193529 bp). Skipping. Feature (2:243028352-243028552) beyond the length of 2 size (242193529 bp). Skipping.
...writing of ./data/test.annotated.tsv (September 06 2023 - 14:09)
...output columns annotation (September 06 2023 - 14:09): AnnotSV_ID;SV_chrom;SV_start;SV_end;SV_length;SV_type;Biologist_annotation;Biologist_ranking;Samples_ID;Annotation_mode;CytoBand;Gene_name;Gene_count;Tx;Tx_start;Tx_end;Overlapped_tx_length;Overlapped_CDS_length;Overlapped_CDS_percent;Frameshift;Exon_count;Location;Location2;Dist_nearest_SS;Nearest_SS_type;Intersect_start;Intersect_end;RE_gene;P_gain_phen;P_gain_hpo;P_gain_source;P_gain_coord;P_loss_phen;P_loss_hpo;P_loss_source;P_loss_coord;P_ins_phen;P_ins_hpo;P_ins_source;P_ins_coord;po_P_gain_phen;po_P_gain_hpo;po_P_gain_source;po_P_gain_coord;po_P_gain_percent;po_P_loss_phen;po_P_loss_hpo;po_P_loss_source;po_P_loss_coord;po_P_loss_percent;P_snvindel_nb;P_snvindel_phen;B_gain_source;B_gain_coord;B_gain_AFmax;B_loss_source;B_loss_coord;B_loss_AFmax;B_ins_source;B_ins_coord;B_ins_AFmax;B_inv_source;B_inv_coord;B_inv_AFmax;po_B_gain_allG_source;po_B_gain_allG_coord;po_B_gain_someG_source;po_B_gain_someG_coord;po_B_loss_allG_source;po_B_loss_allG_coord;po_B_loss_someG_source;po_B_loss_someG_coord;TAD_coordinate;ENCODE_experiment;GC_content_left;GC_content_right;Repeat_coord_left;Repeat_type_left;Repeat_coord_right;Repeat_type_right;Gap_left;Gap_right;SegDup_left;SegDup_right;ENCODE_blacklist_left;ENCODE_blacklist_characteristics_left;ENCODE_blacklist_right;ENCODE_blacklist_characteristics_right;ACMG;HI;TS;DDD_HI_percent;ExAC_delZ;ExAC_dupZ;ExAC_cnvZ;ExAC_synZ;ExAC_misZ;GenCC_disease;GenCC_moi;GenCC_classification;GenCC_pmid;OMIM_ID;OMIM_phenotype;OMIM_inheritance;OMIM_morbid;OMIM_morbid_candidate;LOEUF_bin;GnomAD_pLI;ExAC_pLI;AnnotSV_ranking_score;AnnotSV_ranking_criteria;ACMG_class
...AnnotSV is done with the analysis (September 06 2023 - 14:09)
- Confirmed
data/test.annotated.tsv
was created. Then deleted this file before proceeding to the next step. - Now ran the same AnnotSV command but by appending
-hpo
arg
singularity exec --bind $ANNOTATIONS_DIR:/annotations/ $CONTAINER AnnotSV -SVinputFile data/test.bed -outputFile ./data/test.annotated.tsv -svtBEDcol 4 -annotationsDir /annotations/ -hpo "HP:0001156,HP:0001363,HP:0011304"
Click for stdout/stderr
AnnotSV 3.3.4Copyright (C) 2017-2023 GEOFFROY Veronique
Please feel free to contact me for any suggestions or bug reports email: [email protected]
Tcl/Tk version: 8.6
Application name used: /usr/local
...downloading the configuration data (September 06 2023 - 14:09) ...configuration data by default ...configuration data from /usr/local/etc/AnnotSV/configfile ...configuration data given in arguments ...checking all these configuration data
...checking the annotation data sources (September 06 2023 - 14:09) INFO: AnnotSV takes use of Exomiser (Smedley et al., 2015) for the phenotype-driven analysis. INFO: AnnotSV is using the Human Phenotype Ontology (version 2202). Find out more at http://www.human-phenotype-ontology.org
WARNING: No GeneHancer annotations available. (Please, see in the README file how to add these annotations. Users need to contact the GeneCards team.)
...listing arguments ****************************************** AnnotSV has been run with these arguments: ****************************************** -REreport 0 -REselect1 1 -REselect2 1 -SVinputFile data/test.bed -SVinputInfo 1 -SVminSize 50 -annotationMode both -annotationsDir /annotations -bcftools bcftools -bedtools bedtools -benignAF 0.01 -candidateGenesFiltering 0 -cytoband 1 -genomeBuild GRCh38 -hpo HP:0001156,HP:0001363,HP:0011304 -includeCI 1 -metrics us -miRNAann 1 -minTotalNumber 500 -organism Human -outputDir ./data -outputFile test.annotated.tsv -overlap 100 -overwrite 1 -promoterSize 500 -rankFiltering 1 2 3 4 5 NA -reciprocal 0 -samplesidBEDcol 7 -snvIndelPASS 0 -svtBEDcol 4 -tx RefSeq -variantconvertDir /usr/local/share/python3/variantconvert/ -vcf 0 ******************************************
...searching for SV overlaps with a gene or a regulatory elements ...461 genes overlapped with an SV ...3773 genes regulated by a regulatory element which is overlapped with an SV
...running Exomiser on 3780 gene names (September 06 2023 - 14:09) 10000 /usr/local/share/bash/AnnotSV/searchForAFreePortNumber.bash: line 19: ss: command not found WARNING: port is defined to 50000 ...on port 50000 couldn't open "/usr/local/etc/AnnotSV/application.properties": no such file or directory while executing "open $File r" (procedure "ContentFromFile" line 3) invoked from within "ContentFromFile $g_AnnotSV(etcDir)/application.properties" (procedure "runExomiser" line 21) invoked from within "runExomiser "$L_allGenes" "$g_AnnotSV(hpo)" " (procedure "regulatoryElementsAnnotation" line 90) invoked from within "regulatoryElementsAnnotation $L_allGenesOverlapped" (procedure "genesAnnotation" line 394) invoked from within "genesAnnotation" (file "/usr/local/bin/AnnotSV" line 274)
- Note that two error messages were present in the output.
-
couldn't open "/usr/local/etc/AnnotSV/application.properties": no such file or directory
- major error -
/usr/local/share/bash/AnnotSV/searchForAFreePortNumber.bash: line 19: ss: command not found
- this appears to be more of a warning than an error
-
- Checked if singularity container has file
/usr/local/etc/AnnotSV/application.properties
. It doesn't.configfile
was present in the dirpath but notapplication.properties
$ singularity shell --bind $ANNOTATIONS_DIR:/annotations/ $CONTAINER AnnotSV -SVinputFile data/test.bed
Singularity> ls /usr/local/etc/AnnotSV/
configfile
Singularity> exit
exit
Docker
I checked the docker container in a Mac machine to see if /usr/local/etc/AnnotSV/application.properties
is present in the container. Both v3.3.4
and v3.3.6
were tested. I did not run AnnotSV though as I didn't have a chance to download annotations file in this machine.
$docker run -it quay.io/biocontainers/annotsv:3.3.4--py311hdfd78af_1 sh
sh-5.0# ls /usr/local/etc/AnnotSV/
configfile
sh-5.0# exit
exit
$ docker run -it quay.io/biocontainers/annotsv:3.3.6--py311hdfd78af_0
sh-5.0# ls /usr/local/etc/AnnotSV/
configfile
This is likely related to #184
It looks like it has to do with the conda version of AnnotSV and not singularity or docker.
- Conda env was created was conda env file
name: annotsv_bioconda
channels:
- conda-forge
- bioconda
dependencies:
- annotsv
- Activated conda env
- Ran AnnotSV without
-hpo
and it completed successfully. If you are interested in stderr/stdout, please let me know.
$AnnotSV -SVinputFile test.bed -outputFile ./test.annotated.tsv -svtBEDcol 4 -annotationsDir $ANNOTATIONS_DIR
- Ran again with
-hpo
$ AnnotSV -SVinputFile test.bed -outputFile ./test.annotated.tsv -svtBEDcol 4 -annotationsDir $ANNOTATIONS_DIR -hpo "HP:0001156,HP:0001363,HP:0011304"
AnnotSV 3.3.6
Copyright (C) 2017-2023 GEOFFROY Veronique
Please feel free to contact me for any suggestions or bug reports
email: [email protected]
Tcl/Tk version: 8.6
Application name used:
/dirpath/.conda/envs/annotsv_bioconda
...downloading the configuration data (September 06 2023 - 14:43)
...configuration data by default
...configuration data from /dirpath/.conda/envs/annotsv_bioconda/etc/AnnotSV/configfile
...configuration data given in arguments
...checking all these configuration data
...checking the annotation data sources (September 06 2023 - 14:43)
INFO: AnnotSV takes use of Exomiser (Smedley et al., 2015) for the phenotype-driven analysis.
INFO: AnnotSV is using the Human Phenotype Ontology (version 2202). Find out more at http://www.human-phenotype-ontology.org
WARNING: No GeneHancer annotations available.
(Please, see in the README file how to add these annotations. Users need to contact the GeneCards team.)
...listing arguments
******************************************
AnnotSV has been run with these arguments:
******************************************
-REreport 0
-REselect1 1
-REselect2 1
-SVinputFile test.bed
-SVinputInfo 1
-SVminSize 50
-annotationMode both
-annotationsDir /path/to/AnnotSV/v3.3.6/share/AnnotSV
-bcftools bcftools
-bedtools bedtools
-benignAF 0.01
-candidateGenesFiltering 0
-cytoband 1
-genomeBuild GRCh38
-hpo HP:0001156,HP:0001363,HP:0011304
-includeCI 1
-metrics us
-miRNAann 1
-minTotalNumber 500
-organism Human
-outputDir .
-outputFile test.annotated.tsv
-overlap 100
-overwrite 1
-promoterSize 500
-rankFiltering 1 2 3 4 5 NA
-reciprocal 0
-samplesidBEDcol 7
-snvIndelPASS 0
-svtBEDcol 4
-tx RefSeq
-variantconvertDir /dirpath/.conda/envs/annotsv_bioconda/share/python3/variantconvert/
-vcf 0
******************************************
...searching for SV overlaps with a gene or a regulatory elements
...461 genes overlapped with an SV
...3773 genes regulated by a regulatory element which is overlapped with an SV
...running Exomiser on 3780 gene names (September 06 2023 - 14:43)
...on port 10000
couldn't open "/dirpath/.conda/envs/annotsv_bioconda/etc/AnnotSV/application.properties": no such file or directory
while executing
"open $File r"
(procedure "ContentFromFile" line 3)
invoked from within
"ContentFromFile $g_AnnotSV(etcDir)/application.properties"
(procedure "runExomiser" line 21)
invoked from within
"runExomiser "$L_allGenes" "$g_AnnotSV(hpo)" "
(procedure "regulatoryElementsAnnotation" line 90)
invoked from within
"regulatoryElementsAnnotation $L_allGenesOverlapped"
(procedure "genesAnnotation" line 394)
invoked from within
"genesAnnotation"
(file "/dirpath/.conda/envs/annotsv_bioconda/bin/AnnotSV" line 274)
- Checked contents of
/dirpath/.conda/envs/annotsv_bioconda/etc/AnnotSV
. Fileapplication.properties
was missing.
$ ls /dirpath/.conda/envs/annotsv_bioconda/etc/AnnotSV/
configfile
- Next I manually added the missing file
application.properties
to see if adding this file solves the issues.
$ curl https://raw.githubusercontent.com/lgmgeo/AnnotSV/v3.3.6/etc/AnnotSV/application.properties > /dirpath/.conda/envs/annotsv_bioconda/etc/AnnotSV/application.properties
- Ran the same AnnotSV command again with
-hpo
$ AnnotSV -SVinputFile test.bed -outputFile ./test.annotated.tsv -svtBEDcol 4 -annotationsDir $ANNOTATIONS_DIR -hpo "HP:0001156,HP:0001363,HP:0011304"
*AnnotSV completed successfully created output file test.annotated.tsv
Conclusion
File application.properties
is missing in the conda version of AnnotSV for some reason, and this is the cause of the issue seen here when AnnotSV is run with -hpo
. Fixing this issue will most likely solve this bug.
PS - Thanks for your work making AnnotSV available via bioconda (#166)!
Nextflow implementation of AnnotSV doesn't appear to use -hpo
arg, which explains why this error was not seen there so far.
https://github.com/lgmgeo/AnnotSV/issues/184#issuecomment-1603817934
I'm not really familiar with Makefile
but it appears to my naive eyes that copying of application.properties
happens during step make install-human-annotation
. This step though appears to only run as part of make install-human-annotation
, which is never run in the bioconda version.
Hi @ManavalanG,
Thank you very much for all the time spent on debugging! I really appreciate your contribution. I will contact @nvnieuwk to see what is the best debugging to do. I will get back to you asap.
Best, Véronique
Hi @ManavalanG and @lgmgeo, the reason make install-human-annotation
couldn't be run in the recipe is that it would make the recipe very large (which is a bad practice in bioconda). I would advise you to run make install-human-annotation
one time first and use the annotations created from that command as your annotations input. I'm not really sure what could be the problem otherwise
I think the problem comes from the $ANNOTSV/bin/INSTALL_annotations.sh
file.
It induces this installation error.
@nvnieuwk
The best could be to replace the annotation install commands (in $ANNOTSV/bin/INSTALL_annotations.sh
) with something like that:
cd /path/to/install/annotsv/annotations
git clone https://github.com/lgmgeo/AnnotSV.git
cd AnnotSV
make PREFIX=. install
make PREFIX=. install-human-annotation
mv share/AnnotSV/Annotations_Exomiser ..
mv share/AnnotSV/Annotations_Human ..
cd /path/to/install/annotsv/annotations
rm -r AnnotSV
How do you feel about this?
Looks good to me! I'm all for it as long as this will still work to create a separate folder of the annotations
@ManavalanG, can you give us your thoughts on this?
Modification on the patch_AnnotSV branch: https://github.com/lgmgeo/AnnotSV/blob/patch_AnnotSV/bin/INSTALL_annotations.sh
@lgmgeo Thanks for the quick response and working on this right away :)
I am not too familiar with AnnotSV source code and so please take my thoughts/observations with a huge grain of salt.
-
I searched to see how
bin/INSTALL_annotations.sh
gets used during installation, but, strangely, the only time I see it getting used is during uninstallation for the file removal. -
Downloading large annotations files is discouraged in bioconda build, and this resulted in removal of
make PREFIX="${PREFIX}" install-human-annotation
fromrecipes/annotsv/build.sh
. Source. -
However
bin/INSTALL_annotations.sh
would have downloaded large datasets even prior to the yesterday's edits.
If you could point me to bin/INSTALL_annotations.sh
role during installation, I will attempt to provide a proper feedback. Again, my apologies for any poor understanding!
Today, I updated the INSTALL_annotations.sh file only on the patch_AnnotSV development branch (not the master operating branch).
Here is the result:
Actually, this file is never used in AnnotSV code (i.e. in $ANNOTSV/share/tcl/AnnotSV/*
). The purpose of this file is to help users create a specific directory with AnnotSV annotation, without anything else (no code, no documentation...).
This means that it is provided for documentation only, for bioconda/singularity/docker users.
This means that it is provided for documentation only, for bioconda/singularity/docker users.
Oops, I totally missed that part. My apologies! I must have missed this part in the documentation. In my setup, I installed annotsv directly in the HPC, including human annotations, and then passed those annotations to -annotationsDir
. Let me take a look at your edits again and get back to you :)
I installed annotations using script INSTALL_annotations.sh
(from patch_AnnotSV
branch) and then ran conda-installed AnnotSV with -annotationsDir
pointing to them. Unfortunately, it behaved the same way as I reported yesterday. It was successful without -hpo
but ran into the same error (couldn't open "/dirpath/.conda/envs/annotsv_bioconda/etc/AnnotSV/application.properties": no such file or directory
) when ran with -hpo
.
Ok, I see. I will add a patch asap
1 - I just pushed a patch to fix this issue (only on the patch_AnnotSV development branch).
When using this patch, you also need to copy the following file:
$ANNOTSV/etc/AnnotSV/application.properties file
to:
-annotationsDir
/share/AnnotSV/Annotations_Exomiser/2202/application.properties
(until I update and publish the AnnotSV annotations)
2 - @nvnieuwk
File application.properties is missing in the conda version of AnnotSV for some reason, and this is the cause of the issue seen here when AnnotSV is run with -hpo. Fixing this issue will most likely solve this bug.
Would it be possible to add this file in the conda version of AnnotSV?
It's weird that this file isn't in the recipe because it uses an exact copy of the repository. I can have a look when you release the new version :)
Hi! Just checking in to see if there are any updates or fixes. Thanks :)
Bioconda, docker and singularity are distributed from @nvnieuwk (Thanks!).
I can have a look when you release the new version :)
New version is for very soon.
Hi I'm actually not distributing the containers. They are part of the Biocontainers community. I only maintain the bioconda recipe from which the container is built. I don't have full control over the container
Thanks for the clarification.
I plan to add a checkpoint to check for file application.properties
and adding it if not present, prior to running annotsv (in either conda or singularity env). I will post here on how it goes :)
AnnotSV 3.4 is posted.
@lgmgeo I was able to get v3.4 working in singularity. I copied application.properties
(as shown in the doc) after installation of human annotation data, and this resolved the issue. Thanks for providing a fix and your awesome support during debugging :)
PS - It would be great to see conda/singularity based installation/usage mentioned in the documentation.
PS - It would be great to see conda/singularity based installation/usage mentioned in the documentation.
Currently, it is mentioned on the web site: https://lbgi.fr/AnnotSV/downloads
I will add it in the README later. Before that, I would like to be sure that the conda version of AnnotSV works for all users but I can't find the time to test. I rely on issues with the flag "Docker/Singularity/Bioconda" (https://github.com/lgmgeo/AnnotSV/issues/184, https://github.com/lgmgeo/AnnotSV/issues/195).
While waiting for the README to be updated, I have integrated the installation/usage documentation into the download page.
@nvnieuwk, can you check and tell me if everything looks OK?
Looks good!! Thank you!