ChronQC icon indicating copy to clipboard operation
ChronQC copied to clipboard

could not convert string to float: 'data'

Open multimeric opened this issue 5 years ago • 14 comments

I'm also getting the following error when I try to create a plot:

$ chronqc plot -o chromqc/ chronqc_db/chronqc.stats.sqlite AshTrio chronqc_db/chronqc.default.json -f                                                                   
Started ChronQC                                                                                                                                                                                             
Traceback (most recent call last):                                                                                                                                                                          
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 211, in _prep_values                                                                                                            
    values = ensure_float64(values)                                                                                                                                                                         
  File "pandas/_libs/algos_common_helper.pxi", line 311, in pandas._libs.algos.ensure_float64                                                                                                               
ValueError: could not convert string to float: 'data'                                                                                                                                                       
                                                                                                                                                                                                            
During handling of the above exception, another exception occurred:                                                                                                                                         
                                                                                                                                                                                                            
Traceback (most recent call last):                                                                                                                                                                          
  File "/usr/local/bin/chronqc", line 11, in <module>                                                                                                                                                       
    sys.exit(main())                                                                                                                                                                                        
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc.py", line 168, in main                                                                                                                       
    args.func(args)                                                                                                                                                                                         
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc.py", line 21, in run_plot                                                                                                                    
    chronqc_plot.main(args)                                                                                                                                                                                 
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 676, in main                                                                                                                  
    df_chart = mean_and_stdev(df, column_name, win=win, per_sample=per_sample)                                                                                                                              
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 218, in mean_and_stdev                                                                                                        
    df_dup_all = rolling_mean(df_dup_all, Duplicates, win)                                                                                                                                                  
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 186, in rolling_mean                                                                                                          
    df_dup_all['mean'] = df_dup_all.rolling(win).mean().round(2)[Duplicates]                                                                                                                                
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 1728, in mean                                                                                                                   
    return super(Rolling, self).mean(*args, **kwargs)                                                                                                                                                       
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 1072, in mean                                                                                                                   
    return self._apply('roll_mean', 'mean', **kwargs)                                                                                                                                                       
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 841, in _apply                                                                                                                  
    values = self._prep_values(b.values)                                                                                                                                                                    
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 214, in _prep_values                                                                                                            
    "".format(values.dtype))                                                                                                                                                                                
TypeError: cannot handle this type -> object                     

My input files are:

$ cat chronqc_db/chronqc.default.json
[
    {
        "table_name": "chronqc_stats_data",
        "chart_type": "time_series_with_mean_and_stdev",
        "chart_properties": {
            "y_value": "FastQC_mqc-generalstats-fastqc-avg_sequence_length"
        }
    },
    {
        "table_name": "chronqc_stats_data",
        "chart_type": "time_series_with_mean_and_stdev",
        "chart_properties": {
            "y_value": "FastQC_mqc-generalstats-fastqc-percent_duplicates"
        }
    },
    {
        "table_name": "chronqc_stats_data",
        "chart_type": "time_series_with_mean_and_stdev",
        "chart_properties": {
            "y_value": "FastQC_mqc-generalstats-fastqc-percent_fails"
        }
    },
    {
        "table_name": "chronqc_stats_data",
        "chart_type": "time_series_with_mean_and_stdev",
        "chart_properties": {
            "y_value": "FastQC_mqc-generalstats-fastqc-percent_gc"
        }
    },
    {
        "table_name": "chronqc_stats_data",
        "chart_type": "time_series_with_mean_and_stdev",
        "chart_properties": {
            "y_value": "FastQC_mqc-generalstats-fastqc-total_sequences"
        }
    }
]
$ sqlite3 -cmd 'SELECT * from chronqc_stats_data;' chronqc_db/chronqc.stats.sqlite 
FastQC|all_sections|NIST7086_CGTACTAG_L002_R2_001|/mnt/data/TD01-GV1001_L2_R2/fastqc_data.txt|/mnt/data/TD01-GV1001_L2_R2/fastqc_data.txt|2019-01-22 00:00:00|data|95.229524961251|28.1154060602541|25.0|49.0|21523781.0|AshTrio
FastQC|all_sections|TD01-GV1001_L2.R1|/mnt/data/TD01-GV1001_L1_R1/fastqc_data.txt|/mnt/data/TD01-GV1001_L1_R1/fastqc_data.txt|2019-01-22 00:00:00|data|97.0165501126405|29.1116572397277|25.0|49.0|21523781.0|AshTrio
FastQC|all_sections|TD01-GV1001_L3.R2|/mnt/data/TD01-GV1001_L3_R2/fastqc_data.txt|/mnt/data/TD01-GV1001_L3_R2/fastqc_data.txt|2019-01-22 00:00:00|data|94.9493237371004|27.7428646507917|25.0|49.0|19865573.0|AshTrio
FastQC|all_sections|TD01-GV1001_L1.R2|/mnt/data/TD01-GV1001_L1_R2/fastqc_data.txt|/mnt/data/TD01-GV1001_L1_R2/fastqc_data.txt|2019-01-22 00:00:00|data|95.2768624146094|27.794053686974|25.0|49.0|21168890.0|AshTrio
FastQC|all_sections|TD01-GV1001_L3.R1|/mnt/data/TD01-GV1001_L3_R1/fastqc_data.txt|/mnt/data/TD01-GV1001_L3_R1/fastqc_data.txt|2019-01-22 00:00:00|data|96.7453750264339|28.8361458354103|25.0|49.0|19865573.0|AshTrio
FastQC|all_sections|TD06-GV1010_R2|/mnt/data/TD06-GV1010_R2_fastqc/fastqc_data.txt|/mnt/data/TD06-GV1010_R2_fastqc/fastqc_data.txt|2019-01-25 00:00:00|data|126.0|22.5016825110702|16.6666666666667|48.0|73074989.0|AshTrio
FastQC|all_sections|TD06-GV1009_R2|/mnt/data/TD06-GV1009_R2_fastqc/fastqc_data.txt|/mnt/data/TD06-GV1009_R2_fastqc/fastqc_data.txt|2019-01-25 00:00:00|data|126.0|21.3620029114608|16.6666666666667|48.0|64304319.0|AshTrio
FastQC|all_sections|TD06-GV1008_R2|/mnt/data/TD06-GV1008_R2_fastqc/fastqc_data.txt|/mnt/data/TD06-GV1008_R2_fastqc/fastqc_data.txt|2019-01-25 00:00:00|data|126.0|23.5528852817332|16.6666666666667|48.0|75193388.0|AshTrio
FastQC|all_sections|TD06-GV1009_R1|/mnt/data/TD06-GV1009_R1_fastqc/fastqc_data.txt|/mnt/data/TD06-GV1009_R1_fastqc/fastqc_data.txt|2019-01-25 00:00:00|data|126.0|21.1391622609802|16.6666666666667|48.0|64304319.0|AshTrio
FastQC|all_sections|TD06-GV1008_R1|/mnt/data/TD06-GV1008_R1_fastqc/fastqc_data.txt|/mnt/data/TD06-GV1008_R1_fastqc/fastqc_data.txt|2019-01-25 00:00:00|data|126.0|23.5838138955758|16.6666666666667|48.0|75193388.0|AshTrio
FastQC|all_sections|TD06-GV1010_R1|/mnt/data/TD06-GV1010_R1_fastqc/fastqc_data.txt|/mnt/data/TD06-GV1010_R1_fastqc/fastqc_data.txt|2019-01-25 00:00:00|data|126.0|22.3713007497403|16.6666666666667|48.0|73074989.0|AshTrio

multimeric avatar Apr 05 '19 01:04 multimeric

It seems to relate to df_dup_all.rolling(win).mean() and how that expects all the columns to be floats perhaps? It's quite strange.

multimeric avatar Apr 05 '19 07:04 multimeric

@TMiguelT can you print the headers of chronqc_stats_data as well?

nilesh-tawari avatar Apr 06 '19 05:04 nilesh-tawari

Seems you are trying to plot column with all values as data in your example, hence it is showing error.

nilesh-tawari avatar Apr 06 '19 05:04 nilesh-tawari

Do you mean the SQLite3 database headers?

multimeric avatar Apr 06 '19 07:04 multimeric

@TMiguelT yes

nilesh-tawari avatar Apr 07 '19 11:04 nilesh-tawari

Okay let me run through the steps I performed (after running multiQC):

Successfully created the database:

$ chronqc database --create --run-date-info run_date_info.csv -o . multiqc/multiqc_data/multiqc_general_stats.txt AshTrio                                               
Running ChronQC |##################################################| 100.0%                                                                                                                                 
Created ChronQC db: /mnt/chronqc_db/chronqc.stats.sqlite with 6 records                                                                                                                                     
Created ChronQC default JSON file: /mnt/chronqc_db/chronqc.default.json. Customize the JSON as needed before generating ChronQC plots. 

Then immediately after, I tried to plot (I didn't modify the JSON or the database in any way):

$ chronqc plot -o . chronqc_db/chronqc.stats.sqlite AshTrio chronqc_db/chronqc.default.json -f                                                                          
Started ChronQC                                                                                                                                                                                             
Traceback (most recent call last):                                                                                                                                                                          
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 211, in _prep_values                                                                                                            
    values = ensure_float64(values)                                                                                                                                                                         
  File "pandas/_libs/algos_common_helper.pxi", line 311, in pandas._libs.algos.ensure_float64                                                                                                               
ValueError: could not convert string to float: 'Run-1'                                                                                                                                                      
                                                                                                                                                                                                            
During handling of the above exception, another exception occurred:                                                                                                                                         

Traceback (most recent call last):
  File "/usr/local/bin/chronqc", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc.py", line 168, in main
    args.func(args)
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc.py", line 21, in run_plot
    chronqc_plot.main(args)
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 676, in main
    df_chart = mean_and_stdev(df, column_name, win=win, per_sample=per_sample)
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 218, in mean_and_stdev
    df_dup_all = rolling_mean(df_dup_all, Duplicates, win)
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 186, in rolling_mean
    df_dup_all['mean'] = df_dup_all.rolling(win).mean().round(2)[Duplicates]
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 1728, in mean
    return super(Rolling, self).mean(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 1072, in mean
    return self._apply('roll_mean', 'mean', **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 841, in _apply
    values = self._prep_values(b.values)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 214, in _prep_values
    "".format(values.dtype))
TypeError: cannot handle this type -> object

Here's the SQLite table with headers:

$ sqlite3 chronqc_db/chronqc.stats.sqlite -header 'select * from chronqc_stats_data'
Sample|Run|Date|FastQC_mqc-generalstats-fastqc-avg_sequence_length|FastQC_mqc-generalstats-fastqc-percent_duplicates|FastQC_mqc-generalstats-fastqc-percent_fails|FastQC_mqc-generalstats-fastqc-percent_gc|FastQC_mqc-generalstats-fastqc-total_sequences|Panel
TD06-GV1008_R1|Run-1|2018-01-01 00:00:00|126.0|23.5838138955758|16.6666666666667|48.0|75193388.0|AshTrio
TD06-GV1008_R2|Run-1|2018-01-01 00:00:00|126.0|23.5528852817332|16.6666666666667|48.0|75193388.0|AshTrio
TD06-GV1009_R1|Run-1|2018-02-01 00:00:00|126.0|21.1391622609802|16.6666666666667|48.0|64304319.0|AshTrio
TD06-GV1009_R2|Run-1|2018-02-01 00:00:00|126.0|21.3620029114608|16.6666666666667|48.0|64304319.0|AshTrio
TD06-GV1010_R1|Run-1|2018-03-01 00:00:00|126.0|22.3713007497403|16.6666666666667|48.0|73074989.0|AshTrio
TD06-GV1010_R2|Run-1|2018-03-01 00:00:00|126.0|22.5016825110702|16.6666666666667|48.0|73074989.0|AshTrio

multimeric avatar Apr 08 '19 00:04 multimeric

And here's the database as a more readable table:

Sample Run Date FastQC_mqc-generalstats-fastqc-avg_sequence_length FastQC_mqc-generalstats-fastqc-percent_duplicates FastQC_mqc-generalstats-fastqc-percent_fails FastQC_mqc-generalstats-fastqc-percent_gc FastQC_mqc-generalstats-fastqc-total_sequences Panel
TD06-GV1008_R1 Run-1 2018-01-01 00:00:00 126.0 23.5838138955758 16.6666666666667 48.0 75193388.0 AshTrio
TD06-GV1008_R2 Run-1 2018-01-01 00:00:00 126.0 23.5528852817332 16.6666666666667 48.0 75193388.0 AshTrio
TD06-GV1009_R1 Run-1 2018-02-01 00:00:00 126.0 21.1391622609802 16.6666666666667 48.0 64304319.0 AshTrio
TD06-GV1009_R2 Run-1 2018-02-01 00:00:00 126.0 21.3620029114608 16.6666666666667 48.0 64304319.0 AshTrio
TD06-GV1010_R1 Run-1 2018-03-01 00:00:00 126.0 22.3713007497403 16.6666666666667 48.0 73074989.0 AshTrio
TD06-GV1010_R2 Run-1 2018-03-01 00:00:00 126.0 22.5016825110702 16.6666666666667 48.0 73074989.0 AshTrio

multimeric avatar Apr 08 '19 00:04 multimeric

You need to customize the JSON, in your first example above you were trying to plot column with 'data' string, and in your example above you are plotting "Run" column with all values as "Run-1". Both strings 'data' and 'Run-1' are causing the errors. To customize the JSON simply see what you are plotting "y-value" and make sure it is "numerical value". Hope this helps.

nilesh-tawari avatar Apr 08 '19 02:04 nilesh-tawari

But my JSON is fine. All of the y_value fields are numerical. I don't mention the Run field anywhere:

$ cat chronqc_db/chronqc.default.json 
[
    {
        "table_name": "chronqc_stats_data",
        "chart_type": "time_series_with_mean_and_stdev",
        "chart_properties": {
            "y_value": "FastQC_mqc-generalstats-fastqc-avg_sequence_length"
        }
    },
    {
        "table_name": "chronqc_stats_data",
        "chart_type": "time_series_with_mean_and_stdev",
        "chart_properties": {
            "y_value": "FastQC_mqc-generalstats-fastqc-percent_duplicates"
        }
    },
    {
        "table_name": "chronqc_stats_data",
        "chart_type": "time_series_with_mean_and_stdev",
        "chart_properties": {
            "y_value": "FastQC_mqc-generalstats-fastqc-percent_fails"
        }
    },
    {
        "table_name": "chronqc_stats_data",
        "chart_type": "time_series_with_mean_and_stdev",
        "chart_properties": {
            "y_value": "FastQC_mqc-generalstats-fastqc-percent_gc"
        }
    },
    {
        "table_name": "chronqc_stats_data",
        "chart_type": "time_series_with_mean_and_stdev",
        "chart_properties": {
            "y_value": "FastQC_mqc-generalstats-fastqc-total_sequences"
        }
    }
]

multimeric avatar Apr 08 '19 03:04 multimeric

I'm happy to give you the files if it would help debug the issue

multimeric avatar Apr 10 '19 01:04 multimeric

I'm getting this same issue with the example files here: https://github.com/nilesh-tawari/ChronQC/tree/master/examples/multiqc_example_1.

From within examples/multiqc_example_1, I run:

 $ chronqc database --create --run-date-info run_date_info.csv -o . multiqc_data/multiqc_general_stats.txt SOMATIC
Running ChronQC |##################################################| 100.0% 
Created ChronQC db: /home/michael/Programming/ChronQC/examples/multiqc_example_1/chronqc_db/chronqc.stats.sqlite with 100 records
Created ChronQC default JSON file: /home/michael/Programming/ChronQC/examples/multiqc_example_1/chronqc_db/chronqc.default.json. Customize the JSON as needed before generating ChronQC plots.

$ chronqc plot -o . chronqc_db/chronqc.stats.sqlite SOMATIC chronqc_db/chronqc.default.json
Started ChronQC
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 222, in _prep_values
    values = _ensure_float64(values)
  File "pandas/_libs/algos_common_helper.pxi", line 3182, in pandas._libs.algos.ensure_float64
  File "pandas/_libs/algos_common_helper.pxi", line 3187, in pandas._libs.algos.ensure_float64
ValueError: could not convert string to float: 'CHH13847 (15.7), CHH13848 (16.18), CHH13846 (15.21), CHH13849 (16.8)'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/chronqc", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc.py", line 168, in main
    args.func(args)
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc.py", line 21, in run_plot
    chronqc_plot.main(args)
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 676, in main
    df_chart = mean_and_stdev(df, column_name, win=win, per_sample=per_sample)
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 218, in mean_and_stdev
    df_dup_all = rolling_mean(df_dup_all, Duplicates, win)
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 186, in rolling_mean
    df_dup_all['mean'] = df_dup_all.rolling(win).mean().round(2)[Duplicates]
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 1605, in mean
    return super(Rolling, self).mean(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 1058, in mean
    return self._apply('roll_mean', 'mean', **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 844, in _apply
    values = self._prep_values(b.values)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 225, in _prep_values
    "".format(values.dtype))
TypeError: cannot handle this type -> object

multimeric avatar Jun 24 '19 11:06 multimeric

I've investigated this some more, and the last version of pandas this works with is 0.22.0. If you use 0.23.0 or higher, this error will occur.

multimeric avatar Jun 25 '19 01:06 multimeric

Thank you @TMiguelT for investigating it I will look at it and fix to work with updated pandas.

nilesh-tawari avatar Oct 22 '19 13:10 nilesh-tawari

Excellent, that would be a much better solution than my version pinning

multimeric avatar Oct 24 '19 03:10 multimeric