pycoQC icon indicating copy to clipboard operation
pycoQC copied to clipboard

dorado summary as input to pycoQC

Open sahuno opened this issue 1 month ago • 0 comments

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] i want to be able to use dorado summary tsv files input to pycoQC.

Describe the solution you'd like A clear and concise description of what you want to happen. column names do not match. pls see error thrown by pycoQC

(pycoQC) bash:iscb011:/data1/greenbab/users/ahunos/apps/sandbox 1033 $ pycoQC -f /data1/greenbab/projects/methyl_benchmark_spectrum/data/preprocessed/009N_rerun/results/pod5stats/009N_2/009N_2_pod5_stats.tsv \
> -a /data1/greenbab/projects/methyl_benchmark_spectrum/data/preprocessed/009N_rerun/modbasecalls/mergebam_modkit/results/mark_duplicates/009N/009N_modBaseCalls_sorted_dup.bam \
> -o 009N_merged_pycoQC_output.html
Checking arguments values
Check input data files
Parse data files
Traceback (most recent call last):
  File "/home/ahunos/miniforge3/envs/pycoQC/bin/pycoQC", line 8, in <module>
    sys.exit(main_pycoQC())
  File "/home/ahunos/miniforge3/envs/pycoQC/lib/python3.6/site-packages/pycoQC/__main__.py", line 132, in main_pycoQC
    quiet = args.quiet)
  File "/home/ahunos/miniforge3/envs/pycoQC/lib/python3.6/site-packages/pycoQC/pycoQC.py", line 129, in pycoQC
    quiet=quiet)
  File "/home/ahunos/miniforge3/envs/pycoQC/lib/python3.6/site-packages/pycoQC/pycoQC_parse.py", line 96, in __init__
    summary_reads_df = self._parse_summary()
  File "/home/ahunos/miniforge3/envs/pycoQC/lib/python3.6/site-packages/pycoQC/pycoQC_parse.py", line 139, in _parse_summary
    optional_colnames = ["calibration", "barcode"])
  File "/home/ahunos/miniforge3/envs/pycoQC/lib/python3.6/site-packages/pycoQC/pycoQC_parse.py", line 397, in _select_df_columns
    raise pycoQCError("Column {} not found in the provided sequence_summary file".format(col))
pycoQC.common.pycoQCError: Column read_len not found in the provided sequence_summary file

pls see attached head of summary file from dorado

(pycoQC) bash:iscb011:/data1/greenbab/users/ahunos/apps/sandbox 1035 $ head -n 2 /data1/greenbab/projects/methyl_benchmark_spectrum/data/preprocessed/009N_re
run/results/pod5stats/009N_2/009N_2_pod5_stats.tsv
read_id filename        read_number     channel mux     end_reason      start_time      start_sample    duration        num_samples     minknow_events  sample_rate  median_before   predicted_scaling_scale predicted_scaling_shift tracked_scaling_scale   tracked_scaling_shift   num_reads_since_mux_change      time_since_mux_change        run_id  sample_id       experiment_id   flow_cell_id    pore_type
0022b052-9be7-465d-a521-a23e13ab0309    009N_2.pod5     1911    1347    3       signal_positive 18064.48000000  72257920        0.88500000      3540    0   4000     200.35083008    NaN     NaN     NaN     NaN     0       0.00000000      372e3d9e-cb76-4a59-b378-74df73a6bd3a    7N-2    not_set PAM57680        not_set

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

sahuno avatar Jun 17 '24 15:06 sahuno