pycoQC
pycoQC copied to clipboard
dorado summary as input to pycoQC
Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] i want to be able to use dorado summary tsv files input to pycoQC.
Describe the solution you'd like A clear and concise description of what you want to happen. column names do not match. pls see error thrown by pycoQC
(pycoQC) bash:iscb011:/data1/greenbab/users/ahunos/apps/sandbox 1033 $ pycoQC -f /data1/greenbab/projects/methyl_benchmark_spectrum/data/preprocessed/009N_rerun/results/pod5stats/009N_2/009N_2_pod5_stats.tsv \
> -a /data1/greenbab/projects/methyl_benchmark_spectrum/data/preprocessed/009N_rerun/modbasecalls/mergebam_modkit/results/mark_duplicates/009N/009N_modBaseCalls_sorted_dup.bam \
> -o 009N_merged_pycoQC_output.html
Checking arguments values
Check input data files
Parse data files
Traceback (most recent call last):
File "/home/ahunos/miniforge3/envs/pycoQC/bin/pycoQC", line 8, in <module>
sys.exit(main_pycoQC())
File "/home/ahunos/miniforge3/envs/pycoQC/lib/python3.6/site-packages/pycoQC/__main__.py", line 132, in main_pycoQC
quiet = args.quiet)
File "/home/ahunos/miniforge3/envs/pycoQC/lib/python3.6/site-packages/pycoQC/pycoQC.py", line 129, in pycoQC
quiet=quiet)
File "/home/ahunos/miniforge3/envs/pycoQC/lib/python3.6/site-packages/pycoQC/pycoQC_parse.py", line 96, in __init__
summary_reads_df = self._parse_summary()
File "/home/ahunos/miniforge3/envs/pycoQC/lib/python3.6/site-packages/pycoQC/pycoQC_parse.py", line 139, in _parse_summary
optional_colnames = ["calibration", "barcode"])
File "/home/ahunos/miniforge3/envs/pycoQC/lib/python3.6/site-packages/pycoQC/pycoQC_parse.py", line 397, in _select_df_columns
raise pycoQCError("Column {} not found in the provided sequence_summary file".format(col))
pycoQC.common.pycoQCError: Column read_len not found in the provided sequence_summary file
pls see attached head of summary file from dorado
(pycoQC) bash:iscb011:/data1/greenbab/users/ahunos/apps/sandbox 1035 $ head -n 2 /data1/greenbab/projects/methyl_benchmark_spectrum/data/preprocessed/009N_re
run/results/pod5stats/009N_2/009N_2_pod5_stats.tsv
read_id filename read_number channel mux end_reason start_time start_sample duration num_samples minknow_events sample_rate median_before predicted_scaling_scale predicted_scaling_shift tracked_scaling_scale tracked_scaling_shift num_reads_since_mux_change time_since_mux_change run_id sample_id experiment_id flow_cell_id pore_type
0022b052-9be7-465d-a521-a23e13ab0309 009N_2.pod5 1911 1347 3 signal_positive 18064.48000000 72257920 0.88500000 3540 0 4000 200.35083008 NaN NaN NaN NaN 0 0.00000000 372e3d9e-cb76-4a59-b378-74df73a6bd3a 7N-2 not_set PAM57680 not_set
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.
I do.
Which versions of
- dorado
- pycoqc
- numpy/pandas
are you running?
Where does "009N_2_pod5_stats.tsv" come from? Does not look like dorado sequencing_summary file .
Hi @sklages
dorado version = 0.6.2+14a7067
pycoqc version = pycoQC v2.5.2
numpy/pandas
# Name Version Build Channel
numpy 1.19.5 pypi_0 pypi
pandas 1.1.5 pypi_0 pypi
stats files ere generated with following command;
dorado summary modBaseCalled.bam --verbose > <output>
Still the summary file looks weird. You should at least update to current dorado 0.7.2 and see if the resulting summary file is the same.
Has 009N_2_pod5_stats.tsv been created from modBaseCalled.bam? Is this aligned or unaligned BAM?
Apart from this issue, Python 3.6 has reached EOL a few years ago .. you should update your python (environment) in general ..
Hi, just tried with a summary generated by dorado 0.7.2 summary (tsv format) from a merged bam (multiple batches basecalled). I get the error:
/opt/conda/lib/python3.10/site-packages/pycoQC/pycoQC_plot.py:1471: RuntimeWarning:
overflow encountered in scalar add
Traceback (most recent call last):
File "/opt/conda/bin/pycoQC", line 12, in <module>
sys.exit(main_pycoQC())
File "/opt/conda/lib/python3.10/site-packages/pycoQC/__main__.py", line 109, in main_pycoQC
pycoQC (
File "/opt/conda/lib/python3.10/site-packages/pycoQC/pycoQC.py", line 148, in pycoQC
reporter.html_report(
File "/opt/conda/lib/python3.10/site-packages/pycoQC/pycoQC_report.py", line 84, in html_report
fig = method(**method_args)
File "/opt/conda/lib/python3.10/site-packages/pycoQC/pycoQC_plot.py", line 188, in summary
lab1, dd1 = self.__summary_data (df_level="all", groupby=groupby)
File "/opt/conda/lib/python3.10/site-packages/pycoQC/pycoQC_plot.py", line 257, in __summary_data
cells.append (self.__df_to_cell(df))
File "/opt/conda/lib/python3.10/site-packages/pycoQC/pycoQC_plot.py", line 277, in __df_to_cell
l.append ("{:,.2f}".format(self._compute_N50(df["read_len"])))
TypeError: unsupported format string passed to NoneType.__format__
Apptainer> python --version
Python 3.10.14
>>> pycoQC.__version__
'2.5.0.3'
EDIT:
If I use the python interface with pycoQC v.2.5.2 I get the same warning:
overflow encountered in scalar add
but the HTML report is generated anyways.