ChronQC
ChronQC copied to clipboard
could not convert string to float: 'data'
I'm also getting the following error when I try to create a plot:
$ chronqc plot -o chromqc/ chronqc_db/chronqc.stats.sqlite AshTrio chronqc_db/chronqc.default.json -f
Started ChronQC
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 211, in _prep_values
values = ensure_float64(values)
File "pandas/_libs/algos_common_helper.pxi", line 311, in pandas._libs.algos.ensure_float64
ValueError: could not convert string to float: 'data'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/chronqc", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc.py", line 168, in main
args.func(args)
File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc.py", line 21, in run_plot
chronqc_plot.main(args)
File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 676, in main
df_chart = mean_and_stdev(df, column_name, win=win, per_sample=per_sample)
File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 218, in mean_and_stdev
df_dup_all = rolling_mean(df_dup_all, Duplicates, win)
File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 186, in rolling_mean
df_dup_all['mean'] = df_dup_all.rolling(win).mean().round(2)[Duplicates]
File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 1728, in mean
return super(Rolling, self).mean(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 1072, in mean
return self._apply('roll_mean', 'mean', **kwargs)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 841, in _apply
values = self._prep_values(b.values)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 214, in _prep_values
"".format(values.dtype))
TypeError: cannot handle this type -> object
My input files are:
$ cat chronqc_db/chronqc.default.json
[
{
"table_name": "chronqc_stats_data",
"chart_type": "time_series_with_mean_and_stdev",
"chart_properties": {
"y_value": "FastQC_mqc-generalstats-fastqc-avg_sequence_length"
}
},
{
"table_name": "chronqc_stats_data",
"chart_type": "time_series_with_mean_and_stdev",
"chart_properties": {
"y_value": "FastQC_mqc-generalstats-fastqc-percent_duplicates"
}
},
{
"table_name": "chronqc_stats_data",
"chart_type": "time_series_with_mean_and_stdev",
"chart_properties": {
"y_value": "FastQC_mqc-generalstats-fastqc-percent_fails"
}
},
{
"table_name": "chronqc_stats_data",
"chart_type": "time_series_with_mean_and_stdev",
"chart_properties": {
"y_value": "FastQC_mqc-generalstats-fastqc-percent_gc"
}
},
{
"table_name": "chronqc_stats_data",
"chart_type": "time_series_with_mean_and_stdev",
"chart_properties": {
"y_value": "FastQC_mqc-generalstats-fastqc-total_sequences"
}
}
]
$ sqlite3 -cmd 'SELECT * from chronqc_stats_data;' chronqc_db/chronqc.stats.sqlite
FastQC|all_sections|NIST7086_CGTACTAG_L002_R2_001|/mnt/data/TD01-GV1001_L2_R2/fastqc_data.txt|/mnt/data/TD01-GV1001_L2_R2/fastqc_data.txt|2019-01-22 00:00:00|data|95.229524961251|28.1154060602541|25.0|49.0|21523781.0|AshTrio
FastQC|all_sections|TD01-GV1001_L2.R1|/mnt/data/TD01-GV1001_L1_R1/fastqc_data.txt|/mnt/data/TD01-GV1001_L1_R1/fastqc_data.txt|2019-01-22 00:00:00|data|97.0165501126405|29.1116572397277|25.0|49.0|21523781.0|AshTrio
FastQC|all_sections|TD01-GV1001_L3.R2|/mnt/data/TD01-GV1001_L3_R2/fastqc_data.txt|/mnt/data/TD01-GV1001_L3_R2/fastqc_data.txt|2019-01-22 00:00:00|data|94.9493237371004|27.7428646507917|25.0|49.0|19865573.0|AshTrio
FastQC|all_sections|TD01-GV1001_L1.R2|/mnt/data/TD01-GV1001_L1_R2/fastqc_data.txt|/mnt/data/TD01-GV1001_L1_R2/fastqc_data.txt|2019-01-22 00:00:00|data|95.2768624146094|27.794053686974|25.0|49.0|21168890.0|AshTrio
FastQC|all_sections|TD01-GV1001_L3.R1|/mnt/data/TD01-GV1001_L3_R1/fastqc_data.txt|/mnt/data/TD01-GV1001_L3_R1/fastqc_data.txt|2019-01-22 00:00:00|data|96.7453750264339|28.8361458354103|25.0|49.0|19865573.0|AshTrio
FastQC|all_sections|TD06-GV1010_R2|/mnt/data/TD06-GV1010_R2_fastqc/fastqc_data.txt|/mnt/data/TD06-GV1010_R2_fastqc/fastqc_data.txt|2019-01-25 00:00:00|data|126.0|22.5016825110702|16.6666666666667|48.0|73074989.0|AshTrio
FastQC|all_sections|TD06-GV1009_R2|/mnt/data/TD06-GV1009_R2_fastqc/fastqc_data.txt|/mnt/data/TD06-GV1009_R2_fastqc/fastqc_data.txt|2019-01-25 00:00:00|data|126.0|21.3620029114608|16.6666666666667|48.0|64304319.0|AshTrio
FastQC|all_sections|TD06-GV1008_R2|/mnt/data/TD06-GV1008_R2_fastqc/fastqc_data.txt|/mnt/data/TD06-GV1008_R2_fastqc/fastqc_data.txt|2019-01-25 00:00:00|data|126.0|23.5528852817332|16.6666666666667|48.0|75193388.0|AshTrio
FastQC|all_sections|TD06-GV1009_R1|/mnt/data/TD06-GV1009_R1_fastqc/fastqc_data.txt|/mnt/data/TD06-GV1009_R1_fastqc/fastqc_data.txt|2019-01-25 00:00:00|data|126.0|21.1391622609802|16.6666666666667|48.0|64304319.0|AshTrio
FastQC|all_sections|TD06-GV1008_R1|/mnt/data/TD06-GV1008_R1_fastqc/fastqc_data.txt|/mnt/data/TD06-GV1008_R1_fastqc/fastqc_data.txt|2019-01-25 00:00:00|data|126.0|23.5838138955758|16.6666666666667|48.0|75193388.0|AshTrio
FastQC|all_sections|TD06-GV1010_R1|/mnt/data/TD06-GV1010_R1_fastqc/fastqc_data.txt|/mnt/data/TD06-GV1010_R1_fastqc/fastqc_data.txt|2019-01-25 00:00:00|data|126.0|22.3713007497403|16.6666666666667|48.0|73074989.0|AshTrio
It seems to relate to df_dup_all.rolling(win).mean()
and how that expects all the columns to be floats perhaps? It's quite strange.
@TMiguelT can you print the headers of chronqc_stats_data
as well?
Seems you are trying to plot column with all values as data
in your example, hence it is showing error.
Do you mean the SQLite3 database headers?
@TMiguelT yes
Okay let me run through the steps I performed (after running multiQC):
Successfully created the database:
$ chronqc database --create --run-date-info run_date_info.csv -o . multiqc/multiqc_data/multiqc_general_stats.txt AshTrio
Running ChronQC |##################################################| 100.0%
Created ChronQC db: /mnt/chronqc_db/chronqc.stats.sqlite with 6 records
Created ChronQC default JSON file: /mnt/chronqc_db/chronqc.default.json. Customize the JSON as needed before generating ChronQC plots.
Then immediately after, I tried to plot (I didn't modify the JSON or the database in any way):
$ chronqc plot -o . chronqc_db/chronqc.stats.sqlite AshTrio chronqc_db/chronqc.default.json -f
Started ChronQC
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 211, in _prep_values
values = ensure_float64(values)
File "pandas/_libs/algos_common_helper.pxi", line 311, in pandas._libs.algos.ensure_float64
ValueError: could not convert string to float: 'Run-1'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/chronqc", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc.py", line 168, in main
args.func(args)
File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc.py", line 21, in run_plot
chronqc_plot.main(args)
File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 676, in main
df_chart = mean_and_stdev(df, column_name, win=win, per_sample=per_sample)
File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 218, in mean_and_stdev
df_dup_all = rolling_mean(df_dup_all, Duplicates, win)
File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 186, in rolling_mean
df_dup_all['mean'] = df_dup_all.rolling(win).mean().round(2)[Duplicates]
File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 1728, in mean
return super(Rolling, self).mean(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 1072, in mean
return self._apply('roll_mean', 'mean', **kwargs)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 841, in _apply
values = self._prep_values(b.values)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 214, in _prep_values
"".format(values.dtype))
TypeError: cannot handle this type -> object
Here's the SQLite table with headers:
$ sqlite3 chronqc_db/chronqc.stats.sqlite -header 'select * from chronqc_stats_data'
Sample|Run|Date|FastQC_mqc-generalstats-fastqc-avg_sequence_length|FastQC_mqc-generalstats-fastqc-percent_duplicates|FastQC_mqc-generalstats-fastqc-percent_fails|FastQC_mqc-generalstats-fastqc-percent_gc|FastQC_mqc-generalstats-fastqc-total_sequences|Panel
TD06-GV1008_R1|Run-1|2018-01-01 00:00:00|126.0|23.5838138955758|16.6666666666667|48.0|75193388.0|AshTrio
TD06-GV1008_R2|Run-1|2018-01-01 00:00:00|126.0|23.5528852817332|16.6666666666667|48.0|75193388.0|AshTrio
TD06-GV1009_R1|Run-1|2018-02-01 00:00:00|126.0|21.1391622609802|16.6666666666667|48.0|64304319.0|AshTrio
TD06-GV1009_R2|Run-1|2018-02-01 00:00:00|126.0|21.3620029114608|16.6666666666667|48.0|64304319.0|AshTrio
TD06-GV1010_R1|Run-1|2018-03-01 00:00:00|126.0|22.3713007497403|16.6666666666667|48.0|73074989.0|AshTrio
TD06-GV1010_R2|Run-1|2018-03-01 00:00:00|126.0|22.5016825110702|16.6666666666667|48.0|73074989.0|AshTrio
And here's the database as a more readable table:
Sample | Run | Date | FastQC_mqc-generalstats-fastqc-avg_sequence_length | FastQC_mqc-generalstats-fastqc-percent_duplicates | FastQC_mqc-generalstats-fastqc-percent_fails | FastQC_mqc-generalstats-fastqc-percent_gc | FastQC_mqc-generalstats-fastqc-total_sequences | Panel |
---|---|---|---|---|---|---|---|---|
TD06-GV1008_R1 | Run-1 | 2018-01-01 00:00:00 | 126.0 | 23.5838138955758 | 16.6666666666667 | 48.0 | 75193388.0 | AshTrio |
TD06-GV1008_R2 | Run-1 | 2018-01-01 00:00:00 | 126.0 | 23.5528852817332 | 16.6666666666667 | 48.0 | 75193388.0 | AshTrio |
TD06-GV1009_R1 | Run-1 | 2018-02-01 00:00:00 | 126.0 | 21.1391622609802 | 16.6666666666667 | 48.0 | 64304319.0 | AshTrio |
TD06-GV1009_R2 | Run-1 | 2018-02-01 00:00:00 | 126.0 | 21.3620029114608 | 16.6666666666667 | 48.0 | 64304319.0 | AshTrio |
TD06-GV1010_R1 | Run-1 | 2018-03-01 00:00:00 | 126.0 | 22.3713007497403 | 16.6666666666667 | 48.0 | 73074989.0 | AshTrio |
TD06-GV1010_R2 | Run-1 | 2018-03-01 00:00:00 | 126.0 | 22.5016825110702 | 16.6666666666667 | 48.0 | 73074989.0 | AshTrio |
You need to customize the JSON, in your first example above you were trying to plot column with 'data' string, and in your example above you are plotting "Run" column with all values as "Run-1". Both strings 'data' and 'Run-1' are causing the errors. To customize the JSON simply see what you are plotting "y-value" and make sure it is "numerical value". Hope this helps.
But my JSON is fine. All of the y_value
fields are numerical. I don't mention the Run
field anywhere:
$ cat chronqc_db/chronqc.default.json
[
{
"table_name": "chronqc_stats_data",
"chart_type": "time_series_with_mean_and_stdev",
"chart_properties": {
"y_value": "FastQC_mqc-generalstats-fastqc-avg_sequence_length"
}
},
{
"table_name": "chronqc_stats_data",
"chart_type": "time_series_with_mean_and_stdev",
"chart_properties": {
"y_value": "FastQC_mqc-generalstats-fastqc-percent_duplicates"
}
},
{
"table_name": "chronqc_stats_data",
"chart_type": "time_series_with_mean_and_stdev",
"chart_properties": {
"y_value": "FastQC_mqc-generalstats-fastqc-percent_fails"
}
},
{
"table_name": "chronqc_stats_data",
"chart_type": "time_series_with_mean_and_stdev",
"chart_properties": {
"y_value": "FastQC_mqc-generalstats-fastqc-percent_gc"
}
},
{
"table_name": "chronqc_stats_data",
"chart_type": "time_series_with_mean_and_stdev",
"chart_properties": {
"y_value": "FastQC_mqc-generalstats-fastqc-total_sequences"
}
}
]
I'm happy to give you the files if it would help debug the issue
I'm getting this same issue with the example files here: https://github.com/nilesh-tawari/ChronQC/tree/master/examples/multiqc_example_1.
From within examples/multiqc_example_1
, I run:
$ chronqc database --create --run-date-info run_date_info.csv -o . multiqc_data/multiqc_general_stats.txt SOMATIC
Running ChronQC |##################################################| 100.0%
Created ChronQC db: /home/michael/Programming/ChronQC/examples/multiqc_example_1/chronqc_db/chronqc.stats.sqlite with 100 records
Created ChronQC default JSON file: /home/michael/Programming/ChronQC/examples/multiqc_example_1/chronqc_db/chronqc.default.json. Customize the JSON as needed before generating ChronQC plots.
$ chronqc plot -o . chronqc_db/chronqc.stats.sqlite SOMATIC chronqc_db/chronqc.default.json
Started ChronQC
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 222, in _prep_values
values = _ensure_float64(values)
File "pandas/_libs/algos_common_helper.pxi", line 3182, in pandas._libs.algos.ensure_float64
File "pandas/_libs/algos_common_helper.pxi", line 3187, in pandas._libs.algos.ensure_float64
ValueError: could not convert string to float: 'CHH13847 (15.7), CHH13848 (16.18), CHH13846 (15.21), CHH13849 (16.8)'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/chronqc", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc.py", line 168, in main
args.func(args)
File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc.py", line 21, in run_plot
chronqc_plot.main(args)
File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 676, in main
df_chart = mean_and_stdev(df, column_name, win=win, per_sample=per_sample)
File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 218, in mean_and_stdev
df_dup_all = rolling_mean(df_dup_all, Duplicates, win)
File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 186, in rolling_mean
df_dup_all['mean'] = df_dup_all.rolling(win).mean().round(2)[Duplicates]
File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 1605, in mean
return super(Rolling, self).mean(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 1058, in mean
return self._apply('roll_mean', 'mean', **kwargs)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 844, in _apply
values = self._prep_values(b.values)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 225, in _prep_values
"".format(values.dtype))
TypeError: cannot handle this type -> object
I've investigated this some more, and the last version of pandas this works with is 0.22.0
. If you use 0.23.0
or higher, this error will occur.
Thank you @TMiguelT for investigating it I will look at it and fix to work with updated pandas.
Excellent, that would be a much better solution than my version pinning