pycoQC dorado summary as input to pycoQC

dorado summary as input to pycoQC

Open sahuno opened this issue 1 month ago • 0 comments

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] i want to be able to use dorado summary tsv files input to pycoQC.

Describe the solution you'd like A clear and concise description of what you want to happen. column names do not match. pls see error thrown by pycoQC

(pycoQC) bash:iscb011:/data1/greenbab/users/ahunos/apps/sandbox 1033 $ pycoQC -f /data1/greenbab/projects/methyl_benchmark_spectrum/data/preprocessed/009N_rerun/results/pod5stats/009N_2/009N_2_pod5_stats.tsv \
> -a /data1/greenbab/projects/methyl_benchmark_spectrum/data/preprocessed/009N_rerun/modbasecalls/mergebam_modkit/results/mark_duplicates/009N/009N_modBaseCalls_sorted_dup.bam \
> -o 009N_merged_pycoQC_output.html
Checking arguments values
Check input data files
Parse data files
Traceback (most recent call last):
  File "/home/ahunos/miniforge3/envs/pycoQC/bin/pycoQC", line 8, in <module>
    sys.exit(main_pycoQC())
  File "/home/ahunos/miniforge3/envs/pycoQC/lib/python3.6/site-packages/pycoQC/__main__.py", line 132, in main_pycoQC
    quiet = args.quiet)
  File "/home/ahunos/miniforge3/envs/pycoQC/lib/python3.6/site-packages/pycoQC/pycoQC.py", line 129, in pycoQC
    quiet=quiet)
  File "/home/ahunos/miniforge3/envs/pycoQC/lib/python3.6/site-packages/pycoQC/pycoQC_parse.py", line 96, in __init__
    summary_reads_df = self._parse_summary()
  File "/home/ahunos/miniforge3/envs/pycoQC/lib/python3.6/site-packages/pycoQC/pycoQC_parse.py", line 139, in _parse_summary
    optional_colnames = ["calibration", "barcode"])
  File "/home/ahunos/miniforge3/envs/pycoQC/lib/python3.6/site-packages/pycoQC/pycoQC_parse.py", line 397, in _select_df_columns
    raise pycoQCError("Column {} not found in the provided sequence_summary file".format(col))
pycoQC.common.pycoQCError: Column read_len not found in the provided sequence_summary file

pls see attached head of summary file from dorado

(pycoQC) bash:iscb011:/data1/greenbab/users/ahunos/apps/sandbox 1035 $ head -n 2 /data1/greenbab/projects/methyl_benchmark_spectrum/data/preprocessed/009N_re
run/results/pod5stats/009N_2/009N_2_pod5_stats.tsv
read_id filename        read_number     channel mux     end_reason      start_time      start_sample    duration        num_samples     minknow_events  sample_rate  median_before   predicted_scaling_scale predicted_scaling_shift tracked_scaling_scale   tracked_scaling_shift   num_reads_since_mux_change      time_since_mux_change        run_id  sample_id       experiment_id   flow_cell_id    pore_type
0022b052-9be7-465d-a521-a23e13ab0309    009N_2.pod5     1911    1347    3       signal_positive 18064.48000000  72257920        0.88500000      3540    0   4000     200.35083008    NaN     NaN     NaN     NaN     0       0.00000000      372e3d9e-cb76-4a59-b378-74df73a6bd3a    7N-2    not_set PAM57680        not_set

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

Jun 17 '24 15:06 sahuno

pycoQC pycoQC copied to clipboard

dorado summary as input to pycoQC

pycoQC
pycoQC copied to clipboard