ufcg
ufcg copied to clipboard
ITS genes not being found
Hello, Please, I have done the "ufcg download -t full" and when I run the profile analysis with --set NUC , the tool is not finding any ITS gene for any of the genomes. The other --set options are working.
command:
ufcg profile --input ./genomes/ --output results_phylo_2_nuc --set NUC --thread 28 -k -w tmp_results_phylo_2_nuc
How can I solve this? Thank you in advance
Hello,
I found that the download package actually lacked ITS database, which is required for the extraction.
I uploaded the updated package that will allow your command to run properly.
Could you please try to run ufcg download -t core
once, and run your command again?
Sorry for the inconvenience!
Hi, not sure if I have the same problem, but running
ufcg profile --input genomes --output output --set NUC --force 1 --thread 10 --metadata metadata.tsv
gives me a FAILED : ITS sequence not found.
for every genome.
I installed via conda and tried both ufcg download -t core
and ufcg download -t full
.
Cheers
Edit: it is now running after I changed NUC
to PRO
. I couldn't find anything about this setting in the manual other than one sentence in the tutorial saying "We want to extract protein markers from the sequences. Type 'PRO' to continue."
Hi, thanks for developing ufcg; it's very useful! It seems the problem still persists for me. I ran a command with --set NUC and got the ITS sequence not found message for all genomes. Tried downloading the database as suggested above and reran the command without success either with NUC or PRO. I am not sure if this is relevant; I found only two folders (busco,pro) in ../steineggerlab/ufcg/1.0.5/confid/model. Also there is no hmm profile for ITS in the pro folder. Do you have any suggestion about this?
I followed the instruction for installation from Github. Thanks a lot in advance!
Hello @ignadb, It seems the change I made in the recent update on MMseqs2 parameters ruined its nucleotide search capacity 😞 This could be quickly fixed but it will take some time for the amendment being reflected on the conda mirror. Please wait for the new release or install the program manually from the recent clone.
Hello @endixk, has this issue been resolved? i reinstalled ufcg (i tried both conda and git) yesterday and the pipeline still doesn't find ITS sequences in the genomes.
edit: nevermind, it worked with git clone install!
Hi @endixk, thanks so much for your work on this tool, it's really impressive. Just wondering if the recent commits, including df9d3e6 referenced above, could please be included in a new tag of the ufcg
Docker container? I'd love to be able to extract NUC/BUSCO sequences using the container for a Nextflow pipeline I'm working on.
Hello @jackscanlan, sorry for the late reply.
I also think this is a good time to release a new minor version including these updates.
I will work on it soonish and leave a note here when it's done :)
Hi, I pushed a new version of Docker container recently. Could you please check it out?
Hi @endixk, thanks for making a new version of the Docker container. I'm trying the following command and getting the following output. (Note that this is a Nextflow process with only a single input genome, which is why UFCG isn't finding a bunch of the samples in the metadata file--expected behaviour for me)
Command:
ufcg profile \
--input $3 \
--metadata $META_PATH \
--output . \
-t $6 \
--set NUC \
-f \
-w /tmp/${2} \
--nocolor \
-v
Output:
[32;1m __ __ _____ _____ _____[0m
[32;1m / / / // ___// ___// ___/[0m
[32;1m / / / // /_ / / / / __[0m
[32;1m / /_/ // __/ / /___/ /_/ /[0m
[32;1m \____//_/ \____/\____/[0m[32m v1.0.6[0m
[JUL 15 00:28:13] UFCG |: Verbose option check.
[JUL 15 00:28:13] UFCG |: Timestamp printing option check.
[JUL 15 00:28:13] UFCG |: Input file check : GCA_023212845.1_ASM2321284v1_genomic.fna
[JUL 15 00:28:13] UFCG |: Symbolic link detected : GCA_023212845.1_ASM2321284v1_genomic.fna -> /group/pathogens/IAWS/Personal/JackS/dev/fungal-phylo/work/5b/08202e572e198319df0319156c19e8/GCA_023212845.1_ASM2321284v1_genomic.fna
[JUL 15 00:28:13] UFCG |: Input file check : /group/pathogens/IAWS/Personal/JackS/dev/fungal-phylo/work/5b/08202e572e198319df0319156c19e8/GCA_023212845.1_ASM2321284v1_genomic.fna
[JUL 15 00:28:13] UFCG |: Input argument : ASCII text
[JUL 15 00:28:13] UFCG |: Output directory check : .
[JUL 15 00:28:13] UFCG |: Temporary directory check : /tmp/GCA_023212845.1
[JUL 15 00:28:13] UFCG |: Custom CPU thread count check : 8
[JUL 15 00:28:13] UFCG |: Metadata file check : /group/pathogens/IAWS/Personal/JackS/dev/fungal-phylo/work/58/6bb959fc58e39920facc71ee504686/repository_metadata.tsv
[JUL 15 00:28:13] UFCG |: SUCCESS : Option parsing
[JUL 15 00:28:13] UFCG |: Solving dependencies...
[JUL 15 00:28:14] UFCG |: SUCCESS : Dependency solving
[JUL 15 00:28:14] UFCG |: Launching UFCG profile module...
[JUL 15 00:28:14] UFCG |: Importing given metadata file : /group/pathogens/IAWS/Personal/JackS/dev/fungal-phylo/work/58/6bb959fc58e39920facc71ee504686/repository_metadata.tsv
[JUL 15 00:28:14] UFCG |: Metadata file with 10 entities successfully imported.
[JUL 15 00:28:14] UFCG |: Reading input data...
[JUL 15 00:28:14] WARN |: Metadata entity EPFG6_scaffolds.fasta is not in the input files.
[JUL 15 00:28:14] WARN |: Metadata entity GCA_020975405.1_ASM2097540v1_genomic.fna is not in the input files.
[JUL 15 00:28:14] WARN |: Metadata entity GCA_000426965.1_ASM42696v1_genomic.fna is not in the input files.
[JUL 15 00:28:14] WARN |: Metadata entity GCA_000426985.1_ASM42698v1_genomic.fna is not in the input files.
[JUL 15 00:28:14] WARN |: Metadata entity GCA_000739145.1_Metarhizium_anisopliae_genomic.fna is not in the input files.
[JUL 15 00:28:14] WARN |: Metadata entity GCA_000814975.1_MAN_1.0_genomic.fna is not in the input files.
[JUL 15 00:28:14] WARN |: Metadata entity GCA_013305495.1_ASM1330549v1_genomic.fna is not in the input files.
[JUL 15 00:28:14] WARN |: Metadata entity GCA_013839505.1_ASM1383950v1_genomic.fna is not in the input files.
[JUL 15 00:28:14] WARN |: Metadata entity GCA_039654215.1_AGRO-Manis_genomic.fna is not in the input files.
[JUL 15 00:28:14] UFCG |: Queries prepared. 1 genome sequences identified.
[JUL 15 00:28:14] UFCG |: Temporary directory check : /tmp/GCA_023212845.1/GCA_023212845.1
[JUL 15 00:28:14] UFCG |: QUERY 1/1 : GCA_023212845.1 (Metarhizium anisopliae)
[JUL 15 00:28:14] UFCG |: Extracting nucleotide markers...
[JUL 15 00:28:22] WARN |: Result file not created : /tmp/GCA_023212845.1/GCA_023212845.1/UFCG_4297ba9db2cf4b1a_GCA_023212845.1_ASM2321284v1_genomic.fna.m8
[JUL 15 00:28:22] UFCG |: FAILED : ITS sequence not found.
[JUL 15 00:28:22] UFCG |: Writing results on : ./GCA_023212845.1.ucg
[JUL 15 00:28:22] UFCG |: Cleaning temporary files up...
[JUL 15 00:28:22] UFCG |: Job finished. Terminating process.
So it seems like, for me, the new version hasn't fixed the original issue, unfortunately?
Just to add to that, when I use --set BUSCO
, I get the following, seemingly unrelated, error:
[JUL 15 04:16:55] UFCG |: Timestamp printing option check.
[JUL 15 04:16:55] UFCG |: Input file check : EPFG6_scaffolds.fasta
[JUL 15 04:16:55] UFCG |: Symbolic link detected : EPFG6_scaffolds.fasta -> /group/pathogens/IAWS/Personal/JackS/dev/fungal-phylo/input/EPFG6_scaffolds.fasta
[JUL 15 04:16:55] UFCG |: Input file check : /group/pathogens/IAWS/Personal/JackS/dev/fungal-phylo/input/EPFG6_scaffolds.fasta
[JUL 15 04:16:55] UFCG |: Input argument : ASCII text
[JUL 15 04:16:55] UFCG |: Output directory check : .
[JUL 15 04:16:55] UFCG |: Number of BUSCOs to extract : 0
[JUL 15 04:16:55] UFCG |: Temporary directory check : /tmp/EPFG6
[JUL 15 04:16:55] UFCG |: Custom CPU thread count check : 8
[JUL 15 04:16:55] UFCG |: Metadata file check : /group/pathogens/IAWS/Personal/JackS/dev/fungal-phylo/work/50/3daab5f3d8738f4fef451c36d149cf/repository_metadata.tsv
[JUL 15 04:16:55] UFCG |: SUCCESS : Option parsing
[JUL 15 04:16:55] UFCG |: Solving dependencies...
[JUL 15 04:16:55] UFCG |: SUCCESS : Dependency solving
[JUL 15 04:16:55] UFCG |: Launching UFCG profile module...
[JUL 15 04:16:55] UFCG |: Importing given metadata file : /group/pathogens/IAWS/Personal/JackS/dev/fungal-phylo/work/50/3daab5f3d8738f4fef451c36d149cf/repository_metadata.tsv
[JUL 15 04:16:55] UFCG |: Metadata file with 10 entities successfully imported.
[JUL 15 04:16:55] UFCG |: Reading input data...
[JUL 15 04:16:55] WARN |: Metadata entity GCA_020975405.1_ASM2097540v1_genomic.fna is not in the input files.
[JUL 15 04:16:55] WARN |: Metadata entity GCA_000426965.1_ASM42696v1_genomic.fna is not in the input files.
[JUL 15 04:16:55] WARN |: Metadata entity GCA_000426985.1_ASM42698v1_genomic.fna is not in the input files.
[JUL 15 04:16:55] WARN |: Metadata entity GCA_000739145.1_Metarhizium_anisopliae_genomic.fna is not in the input files.
[JUL 15 04:16:55] WARN |: Metadata entity GCA_000814975.1_MAN_1.0_genomic.fna is not in the input files.
[JUL 15 04:16:55] WARN |: Metadata entity GCA_013305495.1_ASM1330549v1_genomic.fna is not in the input files.
[JUL 15 04:16:55] WARN |: Metadata entity GCA_013839505.1_ASM1383950v1_genomic.fna is not in the input files.
[JUL 15 04:16:55] WARN |: Metadata entity GCA_023212845.1_ASM2321284v1_genomic.fna is not in the input files.
[JUL 15 04:16:55] WARN |: Metadata entity GCA_039654215.1_AGRO-Manis_genomic.fna is not in the input files.
[JUL 15 04:16:55] UFCG |: Queries prepared. 1 genome sequences identified.
[JUL 15 04:16:55] UFCG |: Temporary directory check : /tmp/EPFG6/ABMA9
[JUL 15 04:16:55] UFCG |: QUERY 1/1 : ABMA9 (unknown)
[JUL 15 04:16:55] UFCG |: Extracting BUSCOs...
[JUL 15 04:16:55] UFCG |: RESULT : [Single: 0 ; Duplicated: 0 ; Missing: 0]
[JUL 15 04:16:55] UFCG |: Writing results on : ./ABMA9.ucg
[JUL 15 04:16:55] UFCG |: ERROR! java.lang.StringIndexOutOfBoundsException: Range [0, -1) out of bounds for length 0
[JUL 15 04:16:55] UFCG |: at java.base/jdk.internal.util.Preconditions$1.apply(Preconditions.java:55)
[JUL 15 04:16:55] UFCG |: at java.base/jdk.internal.util.Preconditions$1.apply(Preconditions.java:52)
[JUL 15 04:16:55] UFCG |: at java.base/jdk.internal.util.Preconditions$4.apply(Preconditions.java:213)
[JUL 15 04:16:55] UFCG |: at java.base/jdk.internal.util.Preconditions$4.apply(Preconditions.java:210)
[JUL 15 04:16:55] UFCG |: at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:98)
[JUL 15 04:16:55] UFCG |: at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckFromToIndex(Preconditions.java:112)
[JUL 15 04:16:55] UFCG |: at java.base/jdk.internal.util.Preconditions.checkFromToIndex(Preconditions.java:349)
[JUL 15 04:16:55] UFCG |: at java.base/java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:1093)
[JUL 15 04:16:55] UFCG |: at java.base/java.lang.StringBuilder.substring(StringBuilder.java:91)
[JUL 15 04:16:55] UFCG |: at entity.JsonProfileEntity.setRunData(JsonProfileEntity.java:76)
[JUL 15 04:16:55] UFCG |: at entity.JsonProfileEntity.<init>(JsonProfileEntity.java:25)
[JUL 15 04:16:55] UFCG |: at process.JsonBuildProcess.build(JsonBuildProcess.java:67)
[JUL 15 04:16:55] UFCG |: at module.ProfileModule.run(ProfileModule.java:933)
[JUL 15 04:16:55] UFCG |: at pipeline.ModuleHandler.handle_profile(ModuleHandler.java:45)
[JUL 15 04:16:55] UFCG |: at pipeline.ModuleHandler.handle(ModuleHandler.java:83)
[JUL 15 04:16:55] UFCG |: at pipeline.UFCGMainPipeline.main(UFCGMainPipeline.java:301)
Hey @jackscanlan,
Thanks for sharing the results. For the second issue, BUSCO config files are missing. Running ufcg download -t busco
will solve it. I should have provided a proper error message for this particular issue 😅
For the first one though, I am not 100% clear about the root cause. I tried to reproduce the issue from my environment with symlink inputs but it worked fine. It will be helpful if you could run it using -dev
option to find out exactly which sub-command was problematic. It would be also helpful if you could provide if the program works (or fails) on the default core gene (-s PRO
) task.