DABEST-python
DABEST-python copied to clipboard
Add paired Cohen's d
Is there a way to calculate Cohen's d for paired data in DABEST? Currently DABEST appears to return only unpaired Cohen's d.
Hi @paul-hawkins,
Are you asking how to compute paired Cohen's d? Or are you saying that the paired Cohen's d returned by DABEST is not actually paired?
If you question is the first one, simply load an experiment as a paired experiment:
import pandas as pd
import dabest
# Load the iris dataset. Requires internet access.
iris = pd.read_csv("https://github.com/mwaskom/seaborn-data/raw/master/iris.csv")
iris.reset_index(inplace=True)
virginica = iris[iris.species=="virginica"].copy()
virginica_melted = pd.melt(virginica,
id_vars="index",
value_vars=["sepal_length", "petal_length"],
var_name="flower_part",
value_name="width")
virginica_paired = dabest.load(data=virginica_melted, x="flower_part", y="width",
paired=True, id_col="index",
idx=("sepal_length", "petal_length"))
then produce the Cohen's d:
virginica_paired.cohens_d
DABEST v0.3.0
=============
Good evening!
The current time is Thu Apr 23 18:44:08 2020.
The paired Cohen's d between sepal_length and petal_length is -1.74 [95%CI -2.1, -1.37].
The p-value of the two-sided permutation t-test is 0.0.
5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
The p-value(s) reported are the likelihood(s) of observing the effect size(s),
if the null hypothesis of zero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.
To get the results of all valid statistical tests, use `.cohens_d.statistical_tests`
If you are saying the latter (ie the paired Cohen's d returned by DABEST is not actually paired), could you provide a dummy dataset with the expected accurate values, vis-a-vis what DABEST produces? Thanks!
Joses,
I think this problem is just my inexperience with DABEST. Looking at running the Cohen’s d function
s_control.cohens_d
DABEST v0.3.0
Good morning! The current time is Thu Apr 16 11:45:23 2020.
The unpaired Cohen's d between OMEGA and MOE is 0.182 [95%CI -0.0108, 0.367]. The p-value of the two-sided permutation t-test is 0.0704.
The unpaired Cohen's d between OMEGA and Macromodel is 0.0295 [95%CI -0.168, 0.215]. The p-value of the two-sided permutation t-test is 0.766.
The unpaired Cohen's d between OMEGA and Desmond is 0.37 [95%CI 0.162, 0.572]. The p-value of the two-sided permutation t-test is 0.0002.
The unpaired Cohen's d between OMEGA and RDKit is 0.593 [95%CI 0.36, 0.805]. The p-value of the two-sided permutation t-test is 0.0.
The unpaired Cohen's d between OMEGA and Prime is -0.0323 [95%CI -0.224, 0.162]. The p-value of the two-sided permutation t-test is 0.747.
5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated. The p-value(s) reported are the likelihood(s) of observing the effect size(s), if the null hypothesis of zero difference is true. For each p-value, 5000 reshuffles of the control and test labels were performed.
To get the results of all valid statistical tests, use .cohens_d.statistical_tests
DABEST says it is reporting unpaired Cohen’s d. When looking at the results of
s_control.cohens_d.statistical_tests
There is a column ‘is_paired’ which is set to False, so I thought that being able to set it to True would solve my problem, but I could not find a way to do that.
However the values returned by the ‘cohens_d’ function are more or less identical to the values I get for paired d, so this appears to be a problem that has solved itself.
However, if I run
s_control = db.load(new_df,idx=('OMEGA','MOE','Macromodel','Desmond','RDKit','Prime'),paired=True)
DABEST returns an error
ValueError: is_paired
is True, but some idx in ('OMEGA', 'MOE', 'Macromodel', 'Desmond', 'RDKit', 'Prime') does not consist only of two groups.
It seems like DABEST only allows paired tests between two sets of data, while paired comparisons can be carried out on three or more sets of data.
As you can see this is with DABSET 0.3.0.
Paul.
From: Joses W. Ho [email protected] Sent: Thursday, April 23, 2020 4:49 AM To: ACCLAB/DABEST-python [email protected] Cc: Paul Hawkins [email protected]; Mention [email protected] Subject: Re: [ACCLAB/DABEST-python] Add paired Cohen's d (#99)
Hi @paul-hawkinshttps://github.com/paul-hawkins,
Are you asking how to compute paired Cohen's d? Or are you saying that the paired Cohen's d returned by DABEST is not actually paired?
If you question is the first one, simply load an experiment as a paired experiment:
import pandas as pd
import dabest
Load the iris dataset. Requires internet access.
iris = pd.read_csv("https://github.com/mwaskom/seaborn-data/raw/master/iris.csv")
iris.reset_index(inplace=True)
virginica = iris[iris.species=="virginica"].copy()
virginica_melted = pd.melt(virginica,
id_vars="index",
value_vars=["sepal_length", "petal_length"],
var_name="flower_part",
value_name="width")
virginica_paired = dabest.load(data=virginica_melted, x="flower_part", y="width",
paired=True, id_col="index",
idx=("sepal_length", "petal_length"))
then produce the Cohen's d:
virginica_paired.cohens_d
DABEST v0.3.0
=============
Good evening!
The current time is Thu Apr 23 18:44:08 2020.
The paired Cohen's d between sepal_length and petal_length is -1.74 [95%CI -2.1, -1.37].
The p-value of the two-sided permutation t-test is 0.0.
5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
The p-value(s) reported are the likelihood(s) of observing the effect size(s),
if the null hypothesis of zero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.
To get the results of all valid statistical tests, use .cohens_d.statistical_tests
If you are saying the latter (ie the paired Cohen's d returned by DABEST is not actually paired), could you provide a dummy dataset with the expected accurate values, vis-a-vis what DABEST produces? Thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ACCLAB/DABEST-python/issues/99#issuecomment-618330580, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE6VZPUBJQEXUNRJWCHCENLROAMJXANCNFSM4MMXABCQ.
Hi @paul-hawkins ,
It seems like DABEST only allows paired tests between two sets of data, while paired comparisons can be carried out on three or more sets of data.
This is half-correct: paired comparisons can only be done on pairs of data. For instance, using the data you posted in #98,
new_df = pd.read_csv('all_ringrmsd_data_only.txt', sep='\t')
# Need to have an ID column so DABEST knows which observations go together.
new_df.rename(columns={"Unnamed: 0": "id"}, inplace=True)
multi_paired = dabest.load(
new_df,
# Here, we assume OMEGA and MOE are a set of repeated measures,
# while Macromodel and Desmond are a second, unrelated set of repeated measures.
idx=(('OMEGA','MOE'),
('Macromodel', 'Desmond')),
id_col="id", paired=True)
multi_paired.cohens_d.plot();
DABEST v0.3.0
=============
Good afternoon!
The current time is Fri Apr 24 16:12:45 2020.
The paired Cohen's d between OMEGA and MOE is 0.182 [95%CI 0.0781, 0.285].
The p-value of the two-sided permutation t-test is 0.0014.
The paired Cohen's d between Macromodel and Desmond is 0.314 [95%CI 0.179, 0.448].
The p-value of the two-sided permutation t-test is 0.0.
5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
The p-value(s) reported are the likelihood(s) of observing the effect size(s),
if the null hypothesis of zero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.
To get the results of all valid statistical tests, use `.cohens_d.statistical_tests`
DABEST paired analysis design insists that none of the groups are repeated more than once.
multi_paired_neg = dabest.load(new_df,
idx=(('OMEGA','MOE'),
('OMEGA', 'Desmond')),
id_col="id", paired=True)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-16-53d5efeaf2d7> in <module>()
5 ('OMEGA', 'Desmond')),
6
----> 7 id_col="id", paired=True)
~/anaconda3/envs/dabest-dev-py3.6/lib/python3.6/site-packages/dabest/_api.py in load(data, idx, x, y, paired, id_col, ci, resamples, random_seed)
63 from ._classes import Dabest
64
---> 65 return Dabest(data, idx, x, y, paired, id_col, ci, resamples, random_seed)
~/anaconda3/envs/dabest-dev-py3.6/lib/python3.6/site-packages/dabest/_classes.py in __init__(self, data, idx, x, y, paired, id_col, ci, resamples, random_seed)
60 err1 = ' or a tuple has repeated groups in it.'
61 err2 = ' Please remove any duplicates and try again.'
---> 62 raise ValueError(err0 + err1 + err2)
63
64 else: # mix of string and tuple?
ValueError: Groups are repeated across tuples, or a tuple has repeated groups in it. Please remove any duplicates and try again.
This is designed deliberately to reduce any confusion. Paired comparisons, by definition, should only have a before measure, and an after measure. Setting up the comparison as in multi_paired_neg
implies this is a not a strict paired comparison.
If you are doing a successive repeated measures experiment (ie.OMEGA
is t=0, MOE
is t=1, and then Macromodel
is t=2), the way to do this is:
first = dabest.load(new_df,
idx=('OMEGA','MOE'),
id_col="id", paired=True)
second = dabest.load(new_df,
idx=('OMEGA', 'Macromodel'),
id_col="id", paired=True)
first.cohens_d
DABEST v0.3.0
=============
Good afternoon!
The current time is Fri Apr 24 16:34:10 2020.
The paired Cohen's d between OMEGA and MOE is 0.182 [95%CI 0.0781, 0.285].
The p-value of the two-sided permutation t-test is 0.0014.
5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
The p-value(s) reported are the likelihood(s) of observing the effect size(s),
if the null hypothesis of zero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.
To get the results of all valid statistical tests, use `.cohens_d.statistical_tests`
second.cohens_d
DABEST v0.3.0
=============
Good afternoon!
The current time is Fri Apr 24 16:34:12 2020.
The paired Cohen's d between OMEGA and Macromodel is 0.0295 [95%CI -0.0672, 0.122].
The p-value of the two-sided permutation t-test is 0.541.
5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
The p-value(s) reported are the likelihood(s) of observing the effect size(s),
if the null hypothesis of zero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.
To get the results of all valid statistical tests, use `.cohens_d.statistical_tests`
To plot them alongside each other:
import matplotlib.pyplot as plt
import seaborn as sns
%config InlineBackend.figure_format = 'retina'
sns.set(context="talk")
f, axx = plt.subplots(ncols=2, figsize=(10, 7),
# Adjust the width-wise spacing
gridspec_kw={"wspace":0.5})
plot_kwargs = dict(float_contrast=False,
contrast_ylim=(0, 0.7))
first.cohens_d.plot(ax=axx[0], **plot_kwargs);
second.cohens_d.plot(ax=axx[1], **plot_kwargs);
You can read more here.
Hope this helps!
Hi @paul-hawkins,
I hope Joses sufficiently answered your question.
Just to let you know, we have just released a new version of DABEST and you will have to use paired=baseline
or paired=sequential
for future paired comparisons. Please see the new documentation for details.
I will now be closing this issue. Thank you!