bgcflow Suggestions for documentation

I am creating an issue to list any feedback we get on improving documentation here.

[x] Provide warning and guide to install gcc. Many machine don't have gcc installed by default, so it will be nice to add a note on this requirement as part of installation instructions. I.e. to try gcc --version command before installing BGCFlow
[x] We also need unzip for arts post deployment script. Some systems don't have unzip installed by default.
[x] Add information about the funding agency and the affiliation with CFB on the documentation - good example at https://github.com/antismash/antismash
[ ] Add information on how to check the logs. I think this is quite important to inform readers to look for global logs in .snakemake/logs/ folder. Whereas, more useful logs with error messages are stored in workflow/report/logs/rule_name
[ ] A special information point can be helpful for the GTDB and GTDB-Tk project management. Typically not all genomes will have GTDB information thus users almost always need to create an intermediate PEP to run GTDBTk. Mention that the GTDB-Tk require significant memory and time.
[ ] Mention that users can update GTDB-Tk release versions from the config file. This information is completely removed from the newest example config templates. Of course using 214 is recommended for future.

Jun 09 '23 12:06 OmkarSaMo

Following the Quick Start for installing bgcflow_wrapper, the command 'conda activate bgcflow_wrapper' returns EnvironmentNameNotFound.

Jun 14 '23 13:06 ljdnielsen

Thanks @ljdnielsen for pointing it out :)

Jun 15 '23 12:06 matinnuhamunada

Following the quick and easy way for installing bgcflow_wrapper, step 2 sets the channel priorities to flexible. While running the bgcflow with an example dataset, it says exactly the opposite.. Output (partial):

Step 5. Preparing list of final outputs...
 - Getting outputs for project: Lactobacillus_delbrueckii
 - WARNING: ignoring errors in rule_dictionary
 - Ready to generate all outputs.

GTDB API | Grabbing metadata using GTDB release version: r214
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.

Oct 11 '23 09:10 anpanche

While running the bgcflow with an example dataset, it takes an extra long time for creating conda environments. The following errors were found after the execution of the run.. 1.

Error in rule ncbi_genome_download:
    jobid: 12
    output: data/interim/fasta/GCA_000191165.1.fna, data/interim/assembly_report/GCA_000191165.1.txt, data/interim/assembly_report/GCA_000191165.1.json
    log: logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000191165.1.log (check log file(s) for error details)
    conda-env: /home/azureuser/datadrive/bgc_try1/.snakemake/conda/da7e441115fe0f43f2ddb22b63f792d2_
    shell:
        
            if [[ GCA_000191165.1 == GCF* ]]
            then
                source="refseq"
            elif [[ GCA_000191165.1 == GCA* ]]
            then
                source="genbank"
            else
                echo "accession must start with GCA or GCF" >> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000191165.1.log
            fi
            ncbi-genome-download -s $source -F fasta,assembly-report -A GCA_000191165.1 -o data/raw/ncbi/download -P -N --verbose bacteria 2>> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000191165.1.log
            gunzip -c data/raw/ncbi/download/$source/bacteria/GCA_000191165.1/*.fna.gz > data/interim/fasta/GCA_000191165.1.fna
            cp data/raw/ncbi/download/$source/bacteria/GCA_000191165.1/*report.txt data/interim/assembly_report/GCA_000191165.1.txt
            rm -rf data/raw/ncbi/download/$source/bacteria/GCA_000191165.1
            python workflow/bgcflow/bgcflow/data/get_assembly_information.py data/interim/assembly_report/GCA_000191165.1.txt data/interim/assembly_report/GCA_000191165.1.json GCA_000191165.1 2>> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000191165.1.log
            
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

/usr/bin/bash: line 10: 740849 Killed                  ncbi-genome-download -s $source -F fasta,assembly-report -A GCA_000182835.1 -o data/raw/ncbi/download -P -N --verbose bacteria 2>> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000182835.1.log
[Wed Oct 11 09:48:11 2023]

Error in rule ncbi_genome_download:
    jobid: 10
    output: data/interim/fasta/GCA_000182835.1.fna, data/interim/assembly_report/GCA_000182835.1.txt, data/interim/assembly_report/GCA_000182835.1.json
    log: logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000182835.1.log (check log file(s) for error details)
    conda-env: /home/azureuser/datadrive/bgc_try1/.snakemake/conda/da7e441115fe0f43f2ddb22b63f792d2_
    shell:
        
            if [[ GCA_000182835.1 == GCF* ]]
            then
                source="refseq"
            elif [[ GCA_000182835.1 == GCA* ]]
            then
                source="genbank"
            else
                echo "accession must start with GCA or GCF" >> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000182835.1.log
            fi
            ncbi-genome-download -s $source -F fasta,assembly-report -A GCA_000182835.1 -o data/raw/ncbi/download -P -N --verbose bacteria 2>> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000182835.1.log
            gunzip -c data/raw/ncbi/download/$source/bacteria/GCA_000182835.1/*.fna.gz > data/interim/fasta/GCA_000182835.1.fna
            cp data/raw/ncbi/download/$source/bacteria/GCA_000182835.1/*report.txt data/interim/assembly_report/GCA_000182835.1.txt
            rm -rf data/raw/ncbi/download/$source/bacteria/GCA_000182835.1
            python workflow/bgcflow/bgcflow/data/get_assembly_information.py data/interim/assembly_report/GCA_000182835.1.txt data/interim/assembly_report/GCA_000182835.1.json GCA_000182835.1 2>> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000182835.1.log
            
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

/usr/bin/bash: line 10: 741138 Killed                  ncbi-genome-download -s $source -F fasta,assembly-report -A GCA_000056065.1 -o data/raw/ncbi/download -P -N --verbose bacteria 2>> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000056065.1.log

Error in rule prokka:
    jobid: 36
    input: data/interim/fasta/GCA_000014405.1.fna, data/interim/prokka/GCA_000014405.1/organism_info.txt
    output: data/interim/prokka/GCA_000014405.1/GCA_000014405.1.gff, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.faa, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.gbk, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.txt, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.tsv, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.fna, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.sqn, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.fsa, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.tbl
    log: logs/prokka/prokka/prokka-GCA_000014405.1.log (check log file(s) for error details)
    conda-env: /home/azureuser/datadrive/bgc_try1/.snakemake/conda/6d99c3821030d48ec757a268edfc89df_
    shell:
        
        prokka --outdir data/interim/prokka/GCA_000014405.1 --force              --prefix GCA_000014405.1 --genus "`cut -d "," -f 1 data/interim/prokka/GCA_000014405.1/organism_info.txt`"             --species "`cut -d "," -f 2 data/interim/prokka/GCA_000014405.1/organism_info.txt`" --strain "`cut -d "," -f 3 data/interim/prokka/GCA_000014405.1/organism_info.txt`"             --cdsrnaolap --cpus 4  --increment 10              --evalue 1e-05 data/interim/fasta/GCA_000014405.1.fna &> logs/prokka/prokka/prokka-GCA_000014405.1.log
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Oct 11 '23 09:10 anpanche

Following the quick and easy way for installing bgcflow_wrapper, step 2 sets the channel priorities to flexible. While running the bgcflow with an example dataset, it says exactly the opposite.. Output (partial):

Step 5. Preparing list of final outputs...
 - Getting outputs for project: Lactobacillus_delbrueckii
 - WARNING: ignoring errors in rule_dictionary
 - Ready to generate all outputs.

GTDB API | Grabbing metadata using GTDB release version: r214
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.

Hi, it is normal to get this warning. The reason we don't use strict channel priority right now is because some of the environments are still experimental.

Oct 11 '23 10:10 matinnuhamunada

While running the bgcflow with an example dataset, it takes an extra long time for creating conda environments. The following errors were found after the execution of the run.. 1.

Error in rule ncbi_genome_download:
    jobid: 12
    output: data/interim/fasta/GCA_000191165.1.fna, data/interim/assembly_report/GCA_000191165.1.txt, data/interim/assembly_report/GCA_000191165.1.json
    log: logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000191165.1.log (check log file(s) for error details)
    conda-env: /home/azureuser/datadrive/bgc_try1/.snakemake/conda/da7e441115fe0f43f2ddb22b63f792d2_
    shell:
        
            if [[ GCA_000191165.1 == GCF* ]]
            then
                source="refseq"
            elif [[ GCA_000191165.1 == GCA* ]]
            then
                source="genbank"
            else
                echo "accession must start with GCA or GCF" >> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000191165.1.log
            fi
            ncbi-genome-download -s $source -F fasta,assembly-report -A GCA_000191165.1 -o data/raw/ncbi/download -P -N --verbose bacteria 2>> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000191165.1.log
            gunzip -c data/raw/ncbi/download/$source/bacteria/GCA_000191165.1/*.fna.gz > data/interim/fasta/GCA_000191165.1.fna
            cp data/raw/ncbi/download/$source/bacteria/GCA_000191165.1/*report.txt data/interim/assembly_report/GCA_000191165.1.txt
            rm -rf data/raw/ncbi/download/$source/bacteria/GCA_000191165.1
            python workflow/bgcflow/bgcflow/data/get_assembly_information.py data/interim/assembly_report/GCA_000191165.1.txt data/interim/assembly_report/GCA_000191165.1.json GCA_000191165.1 2>> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000191165.1.log
            
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

/usr/bin/bash: line 10: 740849 Killed                  ncbi-genome-download -s $source -F fasta,assembly-report -A GCA_000182835.1 -o data/raw/ncbi/download -P -N --verbose bacteria 2>> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000182835.1.log
[Wed Oct 11 09:48:11 2023]

Error in rule ncbi_genome_download:
    jobid: 10
    output: data/interim/fasta/GCA_000182835.1.fna, data/interim/assembly_report/GCA_000182835.1.txt, data/interim/assembly_report/GCA_000182835.1.json
    log: logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000182835.1.log (check log file(s) for error details)
    conda-env: /home/azureuser/datadrive/bgc_try1/.snakemake/conda/da7e441115fe0f43f2ddb22b63f792d2_
    shell:
        
            if [[ GCA_000182835.1 == GCF* ]]
            then
                source="refseq"
            elif [[ GCA_000182835.1 == GCA* ]]
            then
                source="genbank"
            else
                echo "accession must start with GCA or GCF" >> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000182835.1.log
            fi
            ncbi-genome-download -s $source -F fasta,assembly-report -A GCA_000182835.1 -o data/raw/ncbi/download -P -N --verbose bacteria 2>> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000182835.1.log
            gunzip -c data/raw/ncbi/download/$source/bacteria/GCA_000182835.1/*.fna.gz > data/interim/fasta/GCA_000182835.1.fna
            cp data/raw/ncbi/download/$source/bacteria/GCA_000182835.1/*report.txt data/interim/assembly_report/GCA_000182835.1.txt
            rm -rf data/raw/ncbi/download/$source/bacteria/GCA_000182835.1
            python workflow/bgcflow/bgcflow/data/get_assembly_information.py data/interim/assembly_report/GCA_000182835.1.txt data/interim/assembly_report/GCA_000182835.1.json GCA_000182835.1 2>> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000182835.1.log
            
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

/usr/bin/bash: line 10: 741138 Killed                  ncbi-genome-download -s $source -F fasta,assembly-report -A GCA_000056065.1 -o data/raw/ncbi/download -P -N --verbose bacteria 2>> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000056065.1.log

Error in rule prokka:
    jobid: 36
    input: data/interim/fasta/GCA_000014405.1.fna, data/interim/prokka/GCA_000014405.1/organism_info.txt
    output: data/interim/prokka/GCA_000014405.1/GCA_000014405.1.gff, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.faa, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.gbk, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.txt, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.tsv, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.fna, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.sqn, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.fsa, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.tbl
    log: logs/prokka/prokka/prokka-GCA_000014405.1.log (check log file(s) for error details)
    conda-env: /home/azureuser/datadrive/bgc_try1/.snakemake/conda/6d99c3821030d48ec757a268edfc89df_
    shell:
        
        prokka --outdir data/interim/prokka/GCA_000014405.1 --force              --prefix GCA_000014405.1 --genus "`cut -d "," -f 1 data/interim/prokka/GCA_000014405.1/organism_info.txt`"             --species "`cut -d "," -f 2 data/interim/prokka/GCA_000014405.1/organism_info.txt`" --strain "`cut -d "," -f 3 data/interim/prokka/GCA_000014405.1/organism_info.txt`"             --cdsrnaolap --cpus 4  --increment 10              --evalue 1e-05 data/interim/fasta/GCA_000014405.1.fna &> logs/prokka/prokka/prokka-GCA_000014405.1.log
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Yes, some of the environment are big so it take some time to download. Nevertheless, the environment set up only happen once or when there is an update on the environment configuration. Are you using conda or mamba in your machine? Switching to mamba is preferred and usually it is much faster.

We are planning to containerize the environments in the future, but at the moment we don't have enough time and resources to set it up yet. Once the docker containers are in place, it should be way faster to deploy BGCFlow.

Oct 11 '23 10:10 matinnuhamunada

While running the bgcflow with an example dataset, it takes an extra long time for creating conda environments. The following errors were found after the execution of the run.. 1.

Error in rule ncbi_genome_download:
    jobid: 12
    output: data/interim/fasta/GCA_000191165.1.fna, data/interim/assembly_report/GCA_000191165.1.txt, data/interim/assembly_report/GCA_000191165.1.json
    log: logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000191165.1.log (check log file(s) for error details)
    conda-env: /home/azureuser/datadrive/bgc_try1/.snakemake/conda/da7e441115fe0f43f2ddb22b63f792d2_
    shell:
        
            if [[ GCA_000191165.1 == GCF* ]]
            then
                source="refseq"
            elif [[ GCA_000191165.1 == GCA* ]]
            then
                source="genbank"
            else
                echo "accession must start with GCA or GCF" >> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000191165.1.log
            fi
            ncbi-genome-download -s $source -F fasta,assembly-report -A GCA_000191165.1 -o data/raw/ncbi/download -P -N --verbose bacteria 2>> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000191165.1.log
            gunzip -c data/raw/ncbi/download/$source/bacteria/GCA_000191165.1/*.fna.gz > data/interim/fasta/GCA_000191165.1.fna
            cp data/raw/ncbi/download/$source/bacteria/GCA_000191165.1/*report.txt data/interim/assembly_report/GCA_000191165.1.txt
            rm -rf data/raw/ncbi/download/$source/bacteria/GCA_000191165.1
            python workflow/bgcflow/bgcflow/data/get_assembly_information.py data/interim/assembly_report/GCA_000191165.1.txt data/interim/assembly_report/GCA_000191165.1.json GCA_000191165.1 2>> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000191165.1.log
            
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

/usr/bin/bash: line 10: 740849 Killed                  ncbi-genome-download -s $source -F fasta,assembly-report -A GCA_000182835.1 -o data/raw/ncbi/download -P -N --verbose bacteria 2>> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000182835.1.log
[Wed Oct 11 09:48:11 2023]

Error in rule ncbi_genome_download:
    jobid: 10
    output: data/interim/fasta/GCA_000182835.1.fna, data/interim/assembly_report/GCA_000182835.1.txt, data/interim/assembly_report/GCA_000182835.1.json
    log: logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000182835.1.log (check log file(s) for error details)
    conda-env: /home/azureuser/datadrive/bgc_try1/.snakemake/conda/da7e441115fe0f43f2ddb22b63f792d2_
    shell:
        
            if [[ GCA_000182835.1 == GCF* ]]
            then
                source="refseq"
            elif [[ GCA_000182835.1 == GCA* ]]
            then
                source="genbank"
            else
                echo "accession must start with GCA or GCF" >> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000182835.1.log
            fi
            ncbi-genome-download -s $source -F fasta,assembly-report -A GCA_000182835.1 -o data/raw/ncbi/download -P -N --verbose bacteria 2>> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000182835.1.log
            gunzip -c data/raw/ncbi/download/$source/bacteria/GCA_000182835.1/*.fna.gz > data/interim/fasta/GCA_000182835.1.fna
            cp data/raw/ncbi/download/$source/bacteria/GCA_000182835.1/*report.txt data/interim/assembly_report/GCA_000182835.1.txt
            rm -rf data/raw/ncbi/download/$source/bacteria/GCA_000182835.1
            python workflow/bgcflow/bgcflow/data/get_assembly_information.py data/interim/assembly_report/GCA_000182835.1.txt data/interim/assembly_report/GCA_000182835.1.json GCA_000182835.1 2>> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000182835.1.log
            
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

/usr/bin/bash: line 10: 741138 Killed                  ncbi-genome-download -s $source -F fasta,assembly-report -A GCA_000056065.1 -o data/raw/ncbi/download -P -N --verbose bacteria 2>> logs/ncbi/ncbi_genome_download/ncbi_genome_download_GCA_000056065.1.log

Error in rule prokka:
    jobid: 36
    input: data/interim/fasta/GCA_000014405.1.fna, data/interim/prokka/GCA_000014405.1/organism_info.txt
    output: data/interim/prokka/GCA_000014405.1/GCA_000014405.1.gff, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.faa, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.gbk, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.txt, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.tsv, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.fna, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.sqn, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.fsa, data/interim/prokka/GCA_000014405.1/GCA_000014405.1.tbl
    log: logs/prokka/prokka/prokka-GCA_000014405.1.log (check log file(s) for error details)
    conda-env: /home/azureuser/datadrive/bgc_try1/.snakemake/conda/6d99c3821030d48ec757a268edfc89df_
    shell:
        
        prokka --outdir data/interim/prokka/GCA_000014405.1 --force              --prefix GCA_000014405.1 --genus "`cut -d "," -f 1 data/interim/prokka/GCA_000014405.1/organism_info.txt`"             --species "`cut -d "," -f 2 data/interim/prokka/GCA_000014405.1/organism_info.txt`" --strain "`cut -d "," -f 3 data/interim/prokka/GCA_000014405.1/organism_info.txt`"             --cdsrnaolap --cpus 4  --increment 10              --evalue 1e-05 data/interim/fasta/GCA_000014405.1.fna &> logs/prokka/prokka/prokka-GCA_000014405.1.log
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

For the error, can you also check what is being written in the log files? For example: logs/prokka/prokka/prokka-GCA_000014405.1.log.

Looking at the error message, it says the job is killed. This sounds more like a resource problem, either internet connection, using more threads than the computer has, or running out of memory. Can you tell me more about the system you are running BGCFlow from?

Oct 11 '23 10:10 matinnuhamunada

I am using VM with 2 CPUs and 8GB memory.

Oct 11 '23 13:10 anpanche

I am using VM with 2 CPUs and 8GB memory.

You might need to increase the memory as some of the tools are demanding. Also, limit the cpu accordingly:

bgcflow run -c 2

Oct 11 '23 14:10 matinnuhamunada

I am using VM with 2 CPUs and 8GB memory.

You might need to increase the memory as some of the tools are demanding. Also, limit the cpu accordingly:

bgcflow run -c 2

Ok, I will check it. Thanks.

Oct 12 '23 07:10 anpanche

I used a system with 8 CPUs and 64GB memory. Still, the problem persists. I am attaching the log file here. log_file.log

Oct 12 '23 08:10 anpanche

I used a system with 8 CPUs and 64GB memory. Still, the problem persists. I am attaching the log file here. log_file.log

Can you attach this log file? logs/prokka/prokka/prokka-GCA_000056065.1.log

Oct 12 '23 12:10 matinnuhamunada

I used a system with 8 CPUs and 64GB memory. Still, the problem persists. I am attaching the log file here. log_file.log

Can you attach this log file? logs/prokka/prokka/prokka-GCA_000056065.1.log

Please find the prokka log file here, prokka_log.log

Oct 12 '23 12:10 anpanche

I used a system with 8 CPUs and 64GB memory. Still, the problem persists. I am attaching the log file here. log_file.log

Can you attach this log file? logs/prokka/prokka/prokka-GCA_000056065.1.log

Please find the prokka log file here, prokka_log.log

Hmm, seems like some perl dependencies are missing from linux? I didn't have this issue on a freshly installed vm, but I do update the Linux vm before setting up gcc. So maybe do an update as mentioned here: https://github.com/NBChub/bgcflow/wiki/00-Installation-Guide#gcc-compiler:

sudo apt update
sudo apt-get install build-essential

Oct 12 '23 13:10 matinnuhamunada

I used a system with 8 CPUs and 64GB memory. Still, the problem persists. I am attaching the log file here. log_file.log

Can you attach this log file? logs/prokka/prokka/prokka-GCA_000056065.1.log

Please find the prokka log file here, prokka_log.log

Hmm, seems like some perl dependencies are missing from linux? I didn't have this issue on a freshly installed vm, but I do update the Linux vm before setting up gcc. So maybe do an update as mentioned here: https://github.com/NBChub/bgcflow/wiki/00-Installation-Guide#gcc-compiler:
sudo apt update
sudo apt-get install build-essential

Thanks, I will update as mentioned and will run and check workflow again.

Oct 12 '23 13:10 anpanche

I used a system with 8 CPUs and 64GB memory. Still, the problem persists. I am attaching the log file here. log_file.log

Can you attach this log file? logs/prokka/prokka/prokka-GCA_000056065.1.log

Please find the prokka log file here, prokka_log.log

Hmm, seems like some perl dependencies are missing from linux? I didn't have this issue on a freshly installed vm, but I do update the Linux vm before setting up gcc. So maybe do an update as mentioned here: https://github.com/NBChub/bgcflow/wiki/00-Installation-Guide#gcc-compiler:
sudo apt update
sudo apt-get install build-essential

2023-10-16T073106.185559.snakemake.log .txt

Thanks, I will update as mentioned and will run and check workflow again.

I ran the workflow again as per your suggestions. Still the problem persists. I have attached the log file for your reference.

Oct 16 '23 08:10 anpanche

I used a system with 8 CPUs and 64GB memory. Still, the problem persists. I am attaching the log file here. log_file.log

Can you attach this log file? logs/prokka/prokka/prokka-GCA_000056065.1.log

Please find the prokka log file here, prokka_log.log

Hmm, seems like some perl dependencies are missing from linux? I didn't have this issue on a freshly installed vm, but I do update the Linux vm before setting up gcc. So maybe do an update as mentioned here: https://github.com/NBChub/bgcflow/wiki/00-Installation-Guide#gcc-compiler:
sudo apt update
sudo apt-get install build-essential
2023-10-16T073106.185559.snakemake.log .txt

Thanks, I will update as mentioned and will run and check workflow again.

I ran the workflow again as per your suggestions. Still the problem persists. I have attached the log file for your reference.

Hi, can you try to fetch the latest main branch? I simplified the prokka env, wonder if that solves the problem.

git fetch
git status
git pull

I'm meeting Binhuan today at 4 for troubleshooting. If you are at Biosustain, you're welcome to join. You can also send me a message on Teams

Oct 16 '23 09:10 matinnuhamunada

I used a system with 8 CPUs and 64GB memory. Still, the problem persists. I am attaching the log file here. log_file.log

Can you attach this log file? logs/prokka/prokka/prokka-GCA_000056065.1.log

Please find the prokka log file here, prokka_log.log

Hmm, seems like some perl dependencies are missing from linux? I didn't have this issue on a freshly installed vm, but I do update the Linux vm before setting up gcc. So maybe do an update as mentioned here: https://github.com/NBChub/bgcflow/wiki/00-Installation-Guide#gcc-compiler:
sudo apt update
sudo apt-get install build-essential
2023-10-16T073106.185559.snakemake.log .txt

Thanks, I will update as mentioned and will run and check workflow again.

I ran the workflow again as per your suggestions. Still the problem persists. I have attached the log file for your reference.
Hi, can you try to fetch the latest main branch? I simplified the prokka env, wonder if that solves the problem.
git fetch
git status
git pull
I'm meeting Binhuan today at 4 for troubleshooting. If you are at Biosustain, you're welcome to join. You can also send me a message on Teams I will check and will come back to you. If it´s possible to join online, I would be happy to join and discuss.

Oct 16 '23 09:10 anpanche

bgcflow bgcflow copied to clipboard

Suggestions for documentation

bgcflow
bgcflow copied to clipboard