nanopolish
nanopolish copied to clipboard
Update nanopolish v0.13.3 in anaconda
Hi there, Would it be possible to udpate the version of nanopolish working in conda please ? Best, Elodie
Hi,
I'm going to make the next release after I merge in the methylation_bam
branch, which I expect to do in the next week or two.
Jared
Hi Jared,
We use the Nextflow-ARTIC pipeline to obtain consensus fasta sequences for SARS-CoV2 samples. We've updated the software on the Nanopore sequencing computer, but have noticed now that the vcf files from nanopolish contain no variants now, so that the subsequent consensus fasta sequence just matches the Wuhan reference (MN908947.3).
The version of nanopolish in bioconda is 0.13.2, so I was wondering if the latest version 0.13.3 of nanopolish might fix this, and if it might be possible to update bioconda with this latest 0.13.3 version of nanopolish.
The nanopolish command used with the Nextflow-ARTIC pipeline is:
nanopolish
variants --verbose --min-flanking-sequence 10 -x 1000000 --progress -t 1 --reads barcode01.fastq -o barcode01.nCoV-2019_1.vcf -b barcode01.trimmed.rg.sorted.bam -g primer-schemes/nCoV-2019/V3/nCoV-2019.reference.fasta -w "MN908947.3:1-29904" --ploidy 1 -m 0.15 --read-group nCoV-2019_1
Thank you, Stephen.
Hi @sbridgett,
There is no difference between 0.13.2 and 0.13.3 with respect to variant calling, so I suspect it isn't the cause of this issue. My first guess is that the FAST5 files are VBZ-compressed, but you don't have the VBZ decompression plugin loaded. Is this possible? You can read more about VBZ compression here: https://github.com/jts/nanopolish/issues/932#issuecomment-914303734 and here: https://github.com/nanoporetech/vbz_compression#vbz-compression
Jared
Thank you for replying so quickly. It might be that the updated Nanopore software writes VBZ-compressed fast5 files now. I'll look into the VBZ decompression plugin, although the nanopolish command used in that step of the Nextflow-ARTIC pipeline that writes the vcf file, only reads from .fastq, .bam and .fasta files, not a .fast5 file, so I'm not sure why it would need VBZ decompression at this step:
nanopolish variants --verbose --min-flanking-sequence 10 -x 1000000 --progress -t 1 --reads barcode01.fastq -o barcode01.nCoV-2019_1.vcf -b barcode01.trimmed.rg.sorted.bam -g primer-schemes/nCoV-2019/V3/nCoV-2019.reference.fasta -w "MN908947.3:1-29904" --ploidy 1 -m 0.15 --read-group nCoV-2019_1
The input "barcode01.fastq" file contains 324,882 reads.
And the "barcode01.trimmed.rg.sorted.bam" file has 29215 (of the 30000 reference bases) covered at mean depth of 97.7 reads:
$ samtools coverage barcode01.trimmed.rg.sorted.bam
#rname startpos endpos numreads covbases coverage meandepth meanbaseq meanmapq
MN908947.3 1 29903 91751 29215 97.6992 1023.17 19.2 60
Nanopolish always reads the fast5 files, in this case it uses the index files for the fastq to determine which ones to load.
Jared
On Dec 14, 2021, at 5:12 PM, Stephen Bridgett @.***> wrote:
Thank you for replying so quickly. It might be that the updated Nanopore software writes VBZ-compressed fast5 files now. I'll look into the VBZ decompression plugin, although the nanopolish command used in that step of the Nextflow-ARTIC pipeline that writes the vcf file, only reads from a .fastq and .bam files, not a .fast5 file, so I'm not sure why it needs VBZ decompression at this step:
nanopolish variants --verbose --min-flanking-sequence 10 -x 1000000 --progress -t 1 --reads barcode01.fastq -o barcode01.nCoV-2019_1.vcf -b barcode01.trimmed.rg.sorted.bam -g primer-schemes/nCoV-2019/V3/nCoV-2019.reference.fasta -w "MN908947.3:1-29904" --ploidy 1 -m 0.15 --read-group nCoV-2019_1
The input "barcode01.fastq" file contains 324,882 reads.
And the "barcode01.trimmed.rg.sorted.bam" file has 29215 (of the 30000 reference bases) covered at mean depth of 97.7 reads:
$ samtools coverage barcode01.trimmed.rg.sorted.bam #rname startpos endpos numreads covbases coverage meandepth meanbaseq meanmapq MN908947.3 1 29903 91751 29215 97.6992 1023.17 19.2 60 — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
Sorry I hadn't realised that Nanopolish also reads the fast5 files when weren't given in the command-line parameters.
You're right, that VBZ-compression of the fast5 files is the cause.
I checked with the lab and the problem has started after the MinION software had been updated on the nanopore computer, and from the MinION release notes the VBZ-compression is enabled by default now.
I installed the latest 0.13.3 version of Nanopolish, and it exit with an error message explaining about the missing plugin:
"The fast5 file is compressed with VBZ but the required plugin is not loaded. Please read the instructions here: https://github.com/nanoporetech/vbz_compression/issues/5"
However, the 0.13.2 version of Nanopolish in conda, running with the same command on the same files, doesn't exit with this error, but continues running, and finishes with the message:
[post-run summary] total reads 89710, unparseable: 0, qc fail: 0, could not calibrate: 0, no alignment: 0, bad fast5: 44855"
but the resulting vcf file has no SNPs.
I've installed that hdf5plugin as per the instructions in that comment
With the plugin library filename at the end of the path, ie:
export HDF5_PLUGIN_PATH=/home/myusername/ont-vbz-hdf-plugin-1.0.1-Linux/usr/local/hdf5/lib/plugin/libvbz_hdf_plugin.so
the Nanopore 0.13.3 still exits with the same error message above.
When I removed the library filename from the path:
export HDF5_PLUGIN_PATH=/home/myusername/ont-vbz-hdf-plugin-1.0.1-Linux/usr/local/hdf5/lib/plugin
then both Nanopolish 0.13.3 and the 0.13.2 ran okay and produced vcf file containing the expected SNPs.
Thank you for your help with this. Much appreciated.
Perhaps on the Nanopolish README.md, in the "Installing the latest code from github (recommended)" section, it might be worth adding a note about fast5 files being VBZ-compressed since the recent MinION software update, and so need to install the hdf5plugin and set the 'HDF5_PLUGIN_PATH' path to enable Nanopolish to read these files.
Thank you for the detailed report, and for pointing out the path in the comment I linked to is incorrect, I have fixed that. One of the differences between 0.13.2 and 0.13.3 is that 0.13.3 will warn when the plugin is missing whereas 0.13.2 will silently skip the data.
Perhaps on the Nanopolish README.md, in the "Installing the latest code from github (recommended)" section, it might be worth adding a note about fast5 files being VBZ-compressed since the recent MinION software update, and so need to install the hdf5plugin and set the 'HDF5_PLUGIN_PATH' path to enable Nanopolish to read these files.
I'll make a note along these lines when I release 0.14 (likely in January). This is a common issue so I'll try to devise a way to automatically install the plugin, if possible.
Yes, if nanopolish release 0.14 could automatically install the hdf5plugin with nanopolish that would be good.
In conda, currently when I install nanopolish, in a new environment, using:
conda install -c bioconda -c conda-forge nanopolish
it does install version nanopolish 0.13.2, but doesn't install the hdf5plugin.
If I then run:
conda install -c bioconda -c conda-forge hdf5plugin
it downgrades nanopolish to version 0.12.5 to install hdf5plugin-2.1.2.
Instead, installing hdf5plugin using the instructions you gave, and setting the HDF5_PLUGIN_PATH, does work okay.
Hi Jared,
since you have merged the methylation_bam branch, do you plan to update the anaconda nanopolish version to 0.14?
Thank you,
Mattia
Yes, I'll try to do this. Thanks for the reminder.