docker-builds
docker-builds copied to clipboard
adding masurca version 4.1.3
There's a new version of MASURCA! (More info here: https://github.com/alekseyzimin/masurca/releases/tag/v4.1.1)
I copied the files from 4.1.0 and made the following changes:
- updated to ubuntu:jammy
- updated the software version ARG
- added a hybrid assembly example to the README
- bwa is now installed via apt-get
Pull Request (PR) checklist:
- [X] Include a description of what is in this pull request in this message.
- [X] The dockerfile successfully builds to a test target for the user creating the PR. (i.e.
docker build --tag samtools:1.15test --target test docker-builds/samtools/1.15) - [X] Directory structure as name of the tool in lower case with special characters removed with a subdirectory of the version number (i.e.
spades/3.12.0/Dockerfile)- [X] (optional) All test files are located in same directory as the Dockerfile (i.e.
shigatyper/2.0.1/test.sh)
- [X] (optional) All test files are located in same directory as the Dockerfile (i.e.
- [X] Create a simple container-specific README.md in the same directory as the Dockerfile (i.e.
spades/3.12.0/README.md)- [X] If this README is longer than 30 lines, there is an explanation as to why more detail was needed
- [X] Dockerfile includes the recommended LABELS
- [X] Main README.md has been updated to include the tool and/or version of the dockerfile(s) in this PR
- [X] Program_Licenses.md contains the tool(s) used in this PR and has been updated for any missing
I see some errors in the test command at the end, I'm surprised it exited 0 and the image built successfully. Looks like it requires file to be installed:
#13 [test 2/2] RUN wget -q https://github.com/rrwick/Unicycler/raw/69e712eb95c4b9f8a46aade467260260a9ce7a91/sample_data/short_reads_1.fastq.gz && wget -q https://github.com/rrwick/Unicycler/raw/69e712eb95c4b9f8a46aade467260260a9ce7a91/sample_data/short_reads_2.fastq.gz && wget -q https://github.com/rrwick/Unicycler/raw/69e712eb95c4b9f8a46aade467260260a9ce7a91/sample_data/long_reads_low_depth.fastq.gz && masurca -t 2 -i short_reads_1.fastq.gz,short_reads_2.fastq.gz -r long_reads_low_depth.fastq.gz
#13 2.313 Verifying PATHS...
#13 2.316 jellyfish OK
#13 2.361 runCA OK
#13 2.373 createSuperReadsForDirectory.perl OK
#13 2.373 creating script file for the actions...done.
#13 2.373 execute assemble.sh to run assembly
#13 2.382 [Thu Mar 14 19:06:59 UTC 2024] Processing pe library reads
#13 2.385 /MaSuRCA-4.1.1/bin/expand_fastq: 12: file: not found
#13 2.385 WARNING!!! Unknown file type for input file 'short_reads_1.fastq.gz', assuming type text/
#13 2.386 /MaSuRCA-4.1.1/bin/expand_fastq: 12: file: not found
#13 2.386 WARNING!!! Unknown file type for input file 'short_reads_2.fastq.gz', assuming type text/
#13 2.387 File 'short_reads_1.fastq.gz' is not a fastq file
#13 2.387 File 'short_reads_2.fastq.gz' is not a fastq file
#13 2.392 [Thu Mar 14 19:06:59 UTC 2024] Average PE read length -nan
#13 2.395 Illegal division by zero at -e line 1.
#13 2.397 [Thu Mar 14 19:06:59 UTC 2024] Using kmer size of for the graph
#13 2.401 [Thu Mar 14 19:06:59 UTC 2024] MIN_Q_CHAR: 64
#13 2.406 [Thu Mar 14 19:06:59 UTC 2024] Creating mer database for Quorum
#13 3.652 [Thu Mar 14 19:07:00 UTC 2024] Error correct PE
#13 4.437 [Thu Mar 14 19:07:01 UTC 2024] Error correction of PE reads failed. Check pe.cor.log.
maybe try adding file to the list of things intstalled via apt-get to see if it resolves?
OK tests look happier now that file is installed, but now there's an error on mega-reads ? But again is not caught as an error, the image builds successfully despite the error (exit code = 0 when it should not be)
#13 [test 2/2] RUN wget -q https://github.com/rrwick/Unicycler/raw/69e712eb95c4b9f8a46aade467260260a9ce7a91/sample_data/short_reads_1.fastq.gz && wget -q https://github.com/rrwick/Unicycler/raw/69e712eb95c4b9f8a46aade467260260a9ce7a91/sample_data/short_reads_2.fastq.gz && wget -q https://github.com/rrwick/Unicycler/raw/69e712eb95c4b9f8a46aade467260260a9ce7a91/sample_data/long_reads_low_depth.fastq.gz && masurca -t 2 -i short_reads_1.fastq.gz,short_reads_2.fastq.gz -r long_reads_low_depth.fastq.gz
#13 1.693 Verifying PATHS...
#13 1.696 jellyfish OK
#13 1.742 runCA OK
#13 1.755 createSuperReadsForDirectory.perl OK
#13 1.755 creating script file for the actions...done.
#13 1.755 execute assemble.sh to run assembly
#13 1.764 [Fri May 3 17:55:35 UTC 2024] Processing pe library reads
#13 1.959 [Fri May 3 17:55:35 UTC 2024] Average PE read length 125
#13 2.066 [Fri May 3 17:55:35 UTC 2024] Using kmer size of 83 for the graph
#13 2.225 [Fri May 3 17:55:35 UTC 2024] MIN_Q_CHAR: 33
#13 2.230 [Fri May 3 17:55:35 UTC 2024] Creating mer database for Quorum
#13 4.109 [Fri May 3 17:55:37 UTC 2024] Error correct PE
#13 10.04 [Fri May 3 17:55:43 UTC 2024] Estimating genome size
#13 11.63 [Fri May 3 17:55:45 UTC 2024] Estimated genome size: 187640
#13 11.63 [Fri May 3 17:55:45 UTC 2024] Creating k-unitigs with k=83
#13 13.49 [Fri May 3 17:55:47 UTC 2024] Computing super reads from PE
#13 14.02 [Fri May 3 17:55:47 UTC 2024] Using CABOG from /MaSuRCA-4.1.1/bin/../CA8/Linux-amd64/bin
#13 14.02 [Fri May 3 17:55:47 UTC 2024] Running mega-reads correction/assembly
#13 14.02 [Fri May 3 17:55:47 UTC 2024] Using mer size 17 for mapping, B=15, d=0.02
#13 14.02 [Fri May 3 17:55:47 UTC 2024] Estimated Genome Size 187640
#13 14.02 [Fri May 3 17:55:47 UTC 2024] Estimated Ploidy 1
#13 14.03 [Fri May 3 17:55:47 UTC 2024] Using 2 threads
#13 14.03 [Fri May 3 17:55:47 UTC 2024] Output prefix mr.83.17.15.0.02
#13 14.04 [Fri May 3 17:55:47 UTC 2024] Creating k-unitigs for k=19
#13 14.85 [Fri May 3 17:55:48 UTC 2024] Pre-correcting long reads
#13 15.09 [Fri May 3 17:55:48 UTC 2024] Pre-corrected reads are in longest_reads.25x.fa
#13 15.10 [Fri May 3 17:55:48 UTC 2024] Computing mega-reads
#13 15.10 [Fri May 3 17:55:48 UTC 2024] Running locally in 1 batch
#13 15.10 [Fri May 3 17:55:48 UTC 2024] mega-reads pass 1 failed
#13 15.10 [Fri May 3 17:55:48 UTC 2024] mega-reads exited before assembly
Something about not able to set mempolicy and interleave mask:
$ cat create_mega-reads.err
set_mempolicy: Operation not permitted
setting interleave mask: Operation not permitted
I'm not sure what to do here....
From what I gather, it looks like Docker prevented numactl from setting mempolicy.
https://forums.docker.com/t/cannot-run-numactl-interleave-all-in-docker/40631/5
I am encountering problems with the biocontainer image as well (quay.io/biocontainers/masurca:4.1.1--pl5321hb5bd705_0):
# masurca -t 2 -i short_reads_1.fastq.gz,short_reads_2.fastq.gz -r long_reads_low_depth.fastq.gz
Verifying PATHS...
jellyfish OK
runCA OK
createSuperReadsForDirectory.perl OK
creating script file for the actions...done.
execute assemble.sh to run assembly
[Fri May 3 19:16:20 UTC 2024] Processing pe library reads
/usr/local/bin/expand_fastq: line 12: file: command not found
/usr/local/bin/expand_fastq: line 12: file: command not found
WARNING!!! Unknown file type for input file 'short_reads_2.fastq.gz', assuming type text/
WARNING!!! Unknown file type for input file 'short_reads_1.fastq.gz', assuming type text/
File 'short_reads_2.fastq.gz' is not a fastq file
File 'short_reads_1.fastq.gz' is not a fastq file
awk: cmd. line:1: Division by zero
[Fri May 3 19:16:20 UTC 2024] Average PE read length
Illegal division by zero at -e line 1.
[Fri May 3 19:16:20 UTC 2024] Using kmer size of for the graph
[Fri May 3 19:16:20 UTC 2024] MIN_Q_CHAR: 64
[Fri May 3 19:16:20 UTC 2024] Creating mer database for Quorum
[Fri May 3 19:16:24 UTC 2024] Error correct PE
[Fri May 3 19:16:26 UTC 2024] Error correction of PE reads failed. Check pe.cor.log.
So... hmm...
I'm a little torn about what to do with this one.
- I only use this image for POLCA
- But I've been moving to pypolca
- But what about the people that actually want to use this for hybrid assembly?
- POLCA doesn't even have any changes in this version
I need to read up on the new GRID_ENGINE=MANUAL to see if that can fix things. I'll move this to a draft for now.
I wouldn't burn too much time/effort on this if you are utilizing POLCA in other ways.
If someone really wants to use masurca for hybrid assembly via a docker image, then we can ask them to help with resolving these issues. I don't work with this tool ever so it's difficult for me to troubleshoot