megalodon
megalodon copied to clipboard
KeyError: 'stratify_type is not a file in the archive'
Hi Marcus,
I was following "Modified Base in Known Context" pipeline. I have finished calibration and generated "megalodon_mod_calibration.npz".
Next I would like to re-run megalodon again with the newly taiyaki-trained model, with the calibration file. But I met an error:
... self._load_calibration() File "/clusterdata/uqjxu8/scratch/anaconda3/envs/vac/lib/python3.9/site-packages/megalodon/calibration.py", line 562, in _load_calibration self.stratify_type = str(calib_data[MOD_STRAT_TYPE_TXT]) File "/clusterdata/uqjxu8/scratch/anaconda3/envs/vac/lib/python3.9/site-packages/numpy/lib/npyio.py", line 260, in getitem raise KeyError("%s is not a file in the archive" % key) KeyError: 'stratify_type is not a file in the archive' srun: error: gpunode-0-7: task 0: Exited with exit code 1 ...
Compared to last successful run, I just changed --disable-mod-calibration to --mod-calibration-filename $WDR/megalodon_calibrated/mod_calibration_statistics.npz:
megalodon $WDR/data/drna/20220321_VAC_mU_fast5 --outputs basecalls mappings mod_mappings mods per_read_mods --output-directory $WDR/megalodon_validate/20220321_VAC_mU_fast5 --reference $WDR/reference_sequence/2021-8_pBASE1_eGFP_sequence_2.fasta --guppy-server-path $WDR/ont-guppy/bin/guppy_basecall_server --rna --guppy-config rna_r9.4.1_70bps_hac_mU.cfg --mod-calibration-filename $WDR/megalodon_calibrated/mod_calibration_statistics.npz --devices 0,1 --processes 40 --overwrite
Would you have any advice for that, please? Thanks, Jon
Could you confirm the contents of the calibration file? The following short python command should expose these values:
import numpy as np
with np.load("$WDR/megalodon_calibrated/mod_calibration_statistics.npz") as calib_fp:
print(calib_fp.files)
Hi Marcus, The contents are:
['all_mod_bases', 'u_mod_llrs', 'u_can_llrs']
What was the output of the calibrate command? In the code the command to save the calibration information should contain the stratification type (see here). I'm not sure how this could be bypassed by that command.
So the error log of the batch file looks like this:
(base) [uqjxu8@wiener vac]$ cat logs/megalodon_calibrate_error.txt [20:33:16] Parsing log-likelihood ratios [20:33:21] Computing u modified base calibration. [20:33:21] Computing reference emperical density. 100%|██████████| 396714939/396714939 [3:07:58<00:00, 35175.07it/s]] ] [23:41:20] Computing alternative emperical density. 100%|██████████| 105673463/105673463 [49:50<00:00, 35339.26it/s]] [00:31:10] Setting new input llr range for more robust calibration (-23, 3) [00:31:10] Computing new reference emperical density. 100%|██████████| 396714939/396714939 [3:00:59<00:00, 36530.77it/s]] ] [03:32:10] Computing new alternative emperical density. 100%|██████████| 105673463/105673463 [46:06<00:00, 38191.69it/s] [04:18:17] Saving calibrations to file.
The output log is empty.
The output of "calibrate generate_modified_base_stats" is "mod_calibration_statistics.npz" And that of "calibrate modified_bases" is "calibrated.pdf"
Thanks!
Just attached the two result files here:
https://cloudstor.aarnet.edu.au/plus/s/KHg7s6wsU0aGjU0 calibrated.pdf
Hi Marcus, did you have a chance looking into the attached result files, please?