nanopolish icon indicating copy to clipboard operation
nanopolish copied to clipboard

call methylation_segmentation fault

Open liuyh99 opened this issue 2 years ago • 3 comments

Hi,i have some problem when i run nanopolish call-mythlation for GM12878. The indexing is normal, but in call-mythlation i met this issue:

nanopolish-0.14.0/nanopolish call-methylation -b GM12878/chr1/GM12878.sort.bam -g human_ref/GRCh38.p13.genome.fa -r GM12878/fast5/chr1_basecall/basecall.fastq -q dam -t 100 > GM12878/chr1/GM12878.methdam_2.tsv

Segmentation fault I used the version both v0.13.2 and v0.14.0. the two version all have this issue above.

I followed the suggestion you provided: nanopolish fast5-check -r reads.fastq and got this :

The readdb file contains 143291 fast5 files
[fast5] ERROR: failed to open
[fast5] OK: opened /GM12878/fast5/chr1_basecall/workspace/Bham/FAB39043-3709921973/LomanLabz_PC_20160923_FNFAB39043_MN17250_mux_scan_Human_1D_ligation_R9_4_23348_ch101_read18_strand.fast5
        [read] OK: found 134405 raw samples for ses
terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::substr: __pos (which is 5) > this->size() (which is 3)
Aborted

but i cannot locate which fast5 file leads to the error

I find difference between GM12878 fast5 file and K562 fast5 file, which could be processed by nanopolish without error

K562:

/                        Group
/Analyses                Group
/Analyses/Basecall_1D_000 Group
/Analyses/Basecall_1D_000/BaseCalled_template Group
/Analyses/Basecall_1D_000/BaseCalled_template/Fastq Dataset {SCALAR}
/Analyses/Basecall_1D_000/Summary Group
/Analyses/Basecall_1D_000/Summary/basecall_1d_template Group
/Analyses/Segmentation_000 Group
/Analyses/Segmentation_000/Summary Group
/Analyses/Segmentation_000/Summary/segmentation Group
/Raw                     Group
/Raw/Reads               Group
/Raw/Reads/Read_27792    Group
/Raw/Reads/Read_27792/Signal Dataset {59935/Inf}
/UniqueGlobalKey         Group
/UniqueGlobalKey/channel_id Group
/UniqueGlobalKey/context_tags Group
/UniqueGlobalKey/tracking_id Group

GM12878:

/                        Group
/Raw                     Group
/Raw/Reads               Group
/Raw/Reads/Read_146      Group
/Raw/Reads/Read_146/Signal Dataset {6969/Inf}
/UniqueGlobalKey         Group
/UniqueGlobalKey/channel_id Group
/UniqueGlobalKey/context_tags Group
/UniqueGlobalKey/tracking_id Group

any useful suggestion would be appreciate.

liuyh99 avatar Jun 15 '22 08:06 liuyh99

It appears the GM12878 files are one-read-per-fast5 (the old format), rather than multi-fast5. As a test could you try to re-pack the fast5s using single_to_multi to see if it resolves the problem?

jts avatar Jun 15 '22 13:06 jts

Or given Nanopolish 0.14.0 supports slow5, those fast5 can be attempted to be converted to slow5 first. slow5 f2s <single_fast5_dir> -d blow5_dir -a -p <num_processes>. slow5tools f2s may be able to catch the problematic fast5 file perhaps.

hasindu2008 avatar Jun 15 '22 13:06 hasindu2008

Hi, sorry for the late reply, I have tried the method [single_to_multi], and used the new multi_fast5 files doing basecalling, mapping. Then I got this problem again when doing nanopolish.

HDF5-DIAG: Error detected in HDF5 (1.8.14) thread 140546462873344:
  #000: H5A.c line 642 in H5Aread(): unable to read attribute
    major: Attribute
    minor: Read failed
  #001: H5Aint.c line 661 in H5A_read(): datatype conversion failed
    major: Attribute
    minor: Unable to encode value
  #002: H5T.c line 4816 in H5T_convert(): data type conversion failed
    major: Attribute
    minor: Unable to encode value
  #003: H5Tconv.c line 3274 in H5T__conv_vlen(): can't read VL data
    major: Datatype
    minor: Read failed
  #004: H5Tvlen.c line 891 in H5T_vlen_disk_read(): Unable to read VL information
    major: Datatype
    minor: Read failed
  #005: H5HG.c line 622 in H5HG_read(): unable to protect global heap
    major: Heap
    minor: Unable to protect metadata
  #006: H5HG.c line 262 in H5HG_protect(): unable to protect global heap
    major: Heap
    minor: Unable to protect metadata
  #007: H5AC.c line 1320 in H5AC_protect(): H5C_protect() failed.
    major: Object cache
    minor: Unable to protect metadata
  #008: H5C.c line 3574 in H5C_protect(): can't load entry
    major: Object cache
    minor: Unable to load metadata into cache
  #009: H5C.c line 7954 in H5C_load_entry(): unable to load entry
    major: Object cache
    minor: Unable to load metadata into cache
  #010: H5HGcache.c line 141 in H5HG_load(): bad global heap collection signature
    major: Heap
    minor: Unable to load metadata into cache
error reading attribute channel_number
nanopolish: src/nanopolish_squiggle_read.cpp:305: void SquiggleRead::load_from_raw(const Fast5Data&, uint32_t): Assertion `this->base_model[strand_idx] != NULL' failed.

I will try the "slow5 method soon to see if it resolves the problem."

liuyh99 avatar Jun 20 '22 04:06 liuyh99