pycoQC icon indicating copy to clipboard operation
pycoQC copied to clipboard

dorado summary as input to pycoQC

Open sahuno opened this issue 1 year ago • 4 comments
trafficstars

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] i want to be able to use dorado summary tsv files input to pycoQC.

Describe the solution you'd like A clear and concise description of what you want to happen. column names do not match. pls see error thrown by pycoQC

(pycoQC) bash:iscb011:/data1/greenbab/users/ahunos/apps/sandbox 1033 $ pycoQC -f /data1/greenbab/projects/methyl_benchmark_spectrum/data/preprocessed/009N_rerun/results/pod5stats/009N_2/009N_2_pod5_stats.tsv \
> -a /data1/greenbab/projects/methyl_benchmark_spectrum/data/preprocessed/009N_rerun/modbasecalls/mergebam_modkit/results/mark_duplicates/009N/009N_modBaseCalls_sorted_dup.bam \
> -o 009N_merged_pycoQC_output.html
Checking arguments values
Check input data files
Parse data files
Traceback (most recent call last):
  File "/home/ahunos/miniforge3/envs/pycoQC/bin/pycoQC", line 8, in <module>
    sys.exit(main_pycoQC())
  File "/home/ahunos/miniforge3/envs/pycoQC/lib/python3.6/site-packages/pycoQC/__main__.py", line 132, in main_pycoQC
    quiet = args.quiet)
  File "/home/ahunos/miniforge3/envs/pycoQC/lib/python3.6/site-packages/pycoQC/pycoQC.py", line 129, in pycoQC
    quiet=quiet)
  File "/home/ahunos/miniforge3/envs/pycoQC/lib/python3.6/site-packages/pycoQC/pycoQC_parse.py", line 96, in __init__
    summary_reads_df = self._parse_summary()
  File "/home/ahunos/miniforge3/envs/pycoQC/lib/python3.6/site-packages/pycoQC/pycoQC_parse.py", line 139, in _parse_summary
    optional_colnames = ["calibration", "barcode"])
  File "/home/ahunos/miniforge3/envs/pycoQC/lib/python3.6/site-packages/pycoQC/pycoQC_parse.py", line 397, in _select_df_columns
    raise pycoQCError("Column {} not found in the provided sequence_summary file".format(col))
pycoQC.common.pycoQCError: Column read_len not found in the provided sequence_summary file

pls see attached head of summary file from dorado

(pycoQC) bash:iscb011:/data1/greenbab/users/ahunos/apps/sandbox 1035 $ head -n 2 /data1/greenbab/projects/methyl_benchmark_spectrum/data/preprocessed/009N_re
run/results/pod5stats/009N_2/009N_2_pod5_stats.tsv
read_id filename        read_number     channel mux     end_reason      start_time      start_sample    duration        num_samples     minknow_events  sample_rate  median_before   predicted_scaling_scale predicted_scaling_shift tracked_scaling_scale   tracked_scaling_shift   num_reads_since_mux_change      time_since_mux_change        run_id  sample_id       experiment_id   flow_cell_id    pore_type
0022b052-9be7-465d-a521-a23e13ab0309    009N_2.pod5     1911    1347    3       signal_positive 18064.48000000  72257920        0.88500000      3540    0   4000     200.35083008    NaN     NaN     NaN     NaN     0       0.00000000      372e3d9e-cb76-4a59-b378-74df73a6bd3a    7N-2    not_set PAM57680        not_set

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

sahuno avatar Jun 17 '24 15:06 sahuno

I do.

Which versions of

  • dorado
  • pycoqc
  • numpy/pandas

are you running?

Where does "009N_2_pod5_stats.tsv" come from? Does not look like dorado sequencing_summary file .

sklages avatar Jul 18 '24 13:07 sklages

Hi @sklages
dorado version = 0.6.2+14a7067 pycoqc version = pycoQC v2.5.2 numpy/pandas

# Name                    Version                   Build  Channel
numpy                     1.19.5                   pypi_0    pypi
pandas                    1.1.5                    pypi_0    pypi

stats files ere generated with following command; dorado summary modBaseCalled.bam --verbose > <output>

sahuno avatar Jul 18 '24 16:07 sahuno

Still the summary file looks weird. You should at least update to current dorado 0.7.2 and see if the resulting summary file is the same. Has 009N_2_pod5_stats.tsv been created from modBaseCalled.bam? Is this aligned or unaligned BAM?

Apart from this issue, Python 3.6 has reached EOL a few years ago .. you should update your python (environment) in general ..

sklages avatar Jul 19 '24 08:07 sklages

Hi, just tried with a summary generated by dorado 0.7.2 summary (tsv format) from a merged bam (multiple batches basecalled). I get the error:

/opt/conda/lib/python3.10/site-packages/pycoQC/pycoQC_plot.py:1471: RuntimeWarning:

overflow encountered in scalar add

Traceback (most recent call last):
  File "/opt/conda/bin/pycoQC", line 12, in <module>
    sys.exit(main_pycoQC())
  File "/opt/conda/lib/python3.10/site-packages/pycoQC/__main__.py", line 109, in main_pycoQC
    pycoQC (
  File "/opt/conda/lib/python3.10/site-packages/pycoQC/pycoQC.py", line 148, in pycoQC
    reporter.html_report(
  File "/opt/conda/lib/python3.10/site-packages/pycoQC/pycoQC_report.py", line 84, in html_report
    fig = method(**method_args)
  File "/opt/conda/lib/python3.10/site-packages/pycoQC/pycoQC_plot.py", line 188, in summary
    lab1, dd1 = self.__summary_data (df_level="all", groupby=groupby)
  File "/opt/conda/lib/python3.10/site-packages/pycoQC/pycoQC_plot.py", line 257, in __summary_data
    cells.append (self.__df_to_cell(df))
  File "/opt/conda/lib/python3.10/site-packages/pycoQC/pycoQC_plot.py", line 277, in __df_to_cell
    l.append ("{:,.2f}".format(self._compute_N50(df["read_len"])))
TypeError: unsupported format string passed to NoneType.__format__
Apptainer> python --version
Python 3.10.14

>>> pycoQC.__version__
'2.5.0.3'

EDIT:

If I use the python interface with pycoQC v.2.5.2 I get the same warning:

overflow encountered in scalar add

but the HTML report is generated anyways.

piplus2 avatar Aug 10 '24 08:08 piplus2