Error methyltrain
I downloaded data from https://www.ebi.ac.uk/ena/data/view/PRJEB13021 and want to compile the pipeline. Since the whole dataset is too large, I extract 20-30 files from ecoli R7 data for training (ecoli_er2925.MSssI.timp.100215.fast5, ecoli_er2925.native.timp.110915.fast5, ecoli_er2925.pcr_MSssI.timp.021216.fast5, ecoli_er2925.pcr.timp.021216.fast5).
Then I compiled the pipeline and the following error occurs. I found that raw R7 fast5 files have no Signal object, I wonder whether this pipeline could be accomplished by these data without Raw Signal.
poretools fasta --type 2D /Users/quanc/Documents/Data/Nanopore/data/ecoli_er2925.MSssI.timp.100215.fast5/pass > ecoli_er2925.MSssI.timp.100215.pass.fasta
poretools fasta --type 2D /Users/quanc/Documents/Data/Nanopore/data/ecoli_er2925.MSssI.timp.100215.fast5/fail > ecoli_er2925.MSssI.timp.100215.fail.fasta
cat ecoli_er2925.MSssI.timp.100215.pass.fasta ecoli_er2925.MSssI.timp.100215.fail.fasta > ecoli_er2925.MSssI.timp.100215.fasta
nanopolish index -d ~/Documents/Data/Nanopore/data/ ecoli_er2925.MSssI.timp.100215.fasta
[readdb] num reads: 17, num reads with path to fast5: 17
bwa mem -t 4 -x ont2d ecoli_k12.fasta ecoli_er2925.MSssI.timp.100215.fasta |\
samtools view -q 20 -Sb - |\
samtools sort -o ecoli_er2925.MSssI.timp.100215.sorted.bam -T %.tmp
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 17 sequences (125274 bp)...
[M::mem_process_seqs] Processed 17 reads in 1.828 CPU sec, 0.558 real sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem -t 4 -x ont2d ecoli_k12.fasta ecoli_er2925.MSssI.timp.100215.fasta
[main] Real time: 0.582 sec; CPU: 1.845 sec
samtools index ecoli_er2925.MSssI.timp.100215.sorted.bam
/Users/quanc/Documents/Workspace/Github/methylation-analysis/initialize_model.sh /Users/quanc/Documents/Workspace/Github/methylation-analysis/models/r7.3_e6_70bps_6mer_template_median68pA.model template t.006 SQK006 > t.006.ont.model
ln -s t.006.ont.model t.006.ont.alphabet_nucleotide.model
/Users/quanc/Documents/Workspace/Github/methylation-analysis/initialize_model.sh /Users/quanc/Documents/Workspace/Github/methylation-analysis/models/r7.3_e6_70bps_6mer_complement_median68pA_pop1.model complement.pop1 c.p1.006 SQK006 > c.p1.006.ont.model
ln -s c.p1.006.ont.model c.p1.006.ont.alphabet_nucleotide.model
/Users/quanc/Documents/Workspace/Github/methylation-analysis/initialize_model.sh /Users/quanc/Documents/Workspace/Github/methylation-analysis/models/r7.3_e6_70bps_6mer_complement_median68pA_pop2.model complement.pop2 c.p2.006 SQK006 > c.p2.006.ont.model
ln -s c.p2.006.ont.model c.p2.006.ont.alphabet_nucleotide.model
echo t.006.ont.alphabet_nucleotide.model c.p1.006.ont.alphabet_nucleotide.model c.p2.006.ont.alphabet_nucleotide.model | tr " " "\n" > ont.alphabet_nucleotide.R7.fofn
nanopolish methyltrain -t 4 --train-kmers all --out-fofn ecoli_er2925.MSssI.timp.100215.alphabet_nucleotide.fofn --out-suffix .ecoli_er2925.MSssI.timp.100215.alphabet_nucleotide.model -m ont.alphabet_nucleotide.R7.fofn -b ecoli_er2925.MSssI.timp.100215.sorted.bam -r ecoli_er2925.MSssI.timp.100215.fasta -g ecoli_k12.fasta.alphabet_nucleotide --filter-policy R7
Training SQK006 for alphabet nucleotide for 6-mers
Starting round 0
HDF5-DIAG: Error detected in HDF5 (1.8.17) thread 123145559535616:
#000: H5L.c line 1117 in H5Lget_name_by_idx(): name doesn't exist
major: Symbol table
minor: Object already exists
#001: H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
major: Symbol table
minor: Object not found
#002: H5Gtraverse.c line 755 in H5G_traverse_real(): component not found
major: Symbol table
minor: Object not found
HDF5-DIAG: Error detected in HDF5 (1.8.17) thread 123145560072192:
#000: H5L.c line 1117 in H5Lget_name_by_idx(): name doesn't exist
major: Symbol table
minor: Object already exists
#001: H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
major: Symbol table
minor: Object not found
#002: H5Gtraverse.c line 755 in H5G_traverse_real(): component not found
major: Symbol table
minor: Object not found
HDF5-DIAG: Error detected in HDF5 (1.8.17) thread 123145560072192:
#000: H5D.c line 358 in H5Dopen2(): not found
major: Dataset
minor: Object not found
#001: H5Gloc.c line 430 in H5G_loc_find(): can't find object
major: Symbol table
minor: Object not found
#002: H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
major: Symbol table
minor: Object not found
#003: H5Gtraverse.c line 641 in H5G_traverse_real(): traversal operator failed
major: Symbol table
minor: Callback failed
#004: H5Gloc.c line 385 in H5G_loc_find_cb(): object 'Signal' doesn't exist
major: Symbol table
minor: Object not found
Assertion failed: (rt.n > 0), function load_from_raw, file src/nanopolish_squiggle_read.cpp, line 321.
HDF5-DIAG: Error detected in HDF5 (1.8.17) thread 123145559535616:
#000: H5D.c line 358 in H5Dopen2(): not found
major: Dataset
minor: Object not found
#001: H5Gloc.c line 430 in H5G_loc_find(): can't find object
major: Symbol table
minor: Object not found
#002: H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
major: Symbol table
minor: Object not found
make: *** [ecoli_er2925.MSssI.timp.100215.alphabet_nucleotide.fofn] Abort trap: 6
Hi,
The fast5 file structure has changed a lot since 2015 and R7 data is no longer well supported. If you want to exactly replicate the analysis for our paper you'll have to use the specific version of nanopolish that we have in pipeline.make.
Jared
@jts Thanks a lot! This error confused me a few days. Based on your suggestion, counld I draw a conclusion that current version of nanopolish have to work with fast5 Files which contain /Raw/Signal ? And is it true for both methyltrain, methyltest and call-methylation?
Yes, all modern ONT data will contain /Raw/Signal. We've tried to maintain support for older data in nanopolish but since no one really uses R7 data anymore some features may be neglected.
Got it. Then I have to find some R9 data to train the model. Thanks again.