megalodon icon indicating copy to clipboard operation
megalodon copied to clipboard

KeyError: 'stratify_type is not a file in the archive'

Open jon-xu opened this issue 2 years ago • 6 comments

Hi Marcus,

I was following "Modified Base in Known Context" pipeline. I have finished calibration and generated "megalodon_mod_calibration.npz".

Next I would like to re-run megalodon again with the newly taiyaki-trained model, with the calibration file. But I met an error:

... self._load_calibration() File "/clusterdata/uqjxu8/scratch/anaconda3/envs/vac/lib/python3.9/site-packages/megalodon/calibration.py", line 562, in _load_calibration self.stratify_type = str(calib_data[MOD_STRAT_TYPE_TXT]) File "/clusterdata/uqjxu8/scratch/anaconda3/envs/vac/lib/python3.9/site-packages/numpy/lib/npyio.py", line 260, in getitem raise KeyError("%s is not a file in the archive" % key) KeyError: 'stratify_type is not a file in the archive' srun: error: gpunode-0-7: task 0: Exited with exit code 1 ...

Compared to last successful run, I just changed --disable-mod-calibration to --mod-calibration-filename $WDR/megalodon_calibrated/mod_calibration_statistics.npz:

megalodon $WDR/data/drna/20220321_VAC_mU_fast5 --outputs basecalls mappings mod_mappings mods per_read_mods --output-directory $WDR/megalodon_validate/20220321_VAC_mU_fast5 --reference $WDR/reference_sequence/2021-8_pBASE1_eGFP_sequence_2.fasta --guppy-server-path $WDR/ont-guppy/bin/guppy_basecall_server --rna --guppy-config rna_r9.4.1_70bps_hac_mU.cfg --mod-calibration-filename $WDR/megalodon_calibrated/mod_calibration_statistics.npz --devices 0,1 --processes 40 --overwrite

Would you have any advice for that, please? Thanks, Jon

jon-xu avatar Jun 16 '22 05:06 jon-xu

Could you confirm the contents of the calibration file? The following short python command should expose these values:

import numpy as np
with np.load("$WDR/megalodon_calibrated/mod_calibration_statistics.npz") as calib_fp:
    print(calib_fp.files)

marcus1487 avatar Jul 06 '22 22:07 marcus1487

Hi Marcus, The contents are:

['all_mod_bases', 'u_mod_llrs', 'u_can_llrs']

jon-xu avatar Jul 11 '22 00:07 jon-xu

What was the output of the calibrate command? In the code the command to save the calibration information should contain the stratification type (see here). I'm not sure how this could be bypassed by that command.

marcus1487 avatar Jul 18 '22 21:07 marcus1487

So the error log of the batch file looks like this:

(base) [uqjxu8@wiener vac]$ cat logs/megalodon_calibrate_error.txt [20:33:16] Parsing log-likelihood ratios [20:33:21] Computing u modified base calibration. [20:33:21] Computing reference emperical density. 100%|██████████| 396714939/396714939 [3:07:58<00:00, 35175.07it/s]] ] [23:41:20] Computing alternative emperical density. 100%|██████████| 105673463/105673463 [49:50<00:00, 35339.26it/s]] [00:31:10] Setting new input llr range for more robust calibration (-23, 3) [00:31:10] Computing new reference emperical density. 100%|██████████| 396714939/396714939 [3:00:59<00:00, 36530.77it/s]] ] [03:32:10] Computing new alternative emperical density. 100%|██████████| 105673463/105673463 [46:06<00:00, 38191.69it/s] [04:18:17] Saving calibrations to file.

The output log is empty.

The output of "calibrate generate_modified_base_stats" is "mod_calibration_statistics.npz" And that of "calibrate modified_bases" is "calibrated.pdf"

Thanks!

jon-xu avatar Jul 18 '22 23:07 jon-xu

Just attached the two result files here:

https://cloudstor.aarnet.edu.au/plus/s/KHg7s6wsU0aGjU0 calibrated.pdf

jon-xu avatar Jul 18 '22 23:07 jon-xu

Hi Marcus, did you have a chance looking into the attached result files, please?

jon-xu avatar Oct 04 '22 05:10 jon-xu