scikit-posthocs icon indicating copy to clipboard operation
scikit-posthocs copied to clipboard

Dunnetts test

Open comatrion opened this issue 6 years ago • 11 comments

Thanks for this package! One useful addition (unless I am missing it) would be Dunnett's test comparison to a control.

comatrion avatar Jun 07 '19 09:06 comatrion

@comatrion Thank you for suggestion! I will implement it soon.

maximtrp avatar Jun 16 '19 19:06 maximtrp

Implementing Dunnett would be fantastic, please don't hesitate to ping me, I have Matlab and FORTRAN code(that I also converted to C++) implementing it, I also have access to all the major books and papers on the subjects like the one from HSU. Needless to say, there are many R packages implementing it too but those are GPL licensed. I am willing to share all my resources to help speed up implementation, please let me know at [email protected].

mysticaltech avatar Jun 03 '20 12:06 mysticaltech

@mysticaltech This test relies upon multivariate Student's t-distribution function (pmvt in R). AFAIR, I could not find a usable implementation of it in Python a year ago. It would be great, if you helped me with it. I hope there is one somewhere on the web :smile:

maximtrp avatar Jun 04 '20 15:06 maximtrp

Maxim, I have detailed papers explaining it, and clear FORTRAN code (from the famous book "Multiple Comparison" by Jason Hsu, with a detailed chapter about it) and Matlab code implementing it, I also was able to transpile the FORTRAN code to C++. Quite readable! Please send a quick hello to the email above and I will reply with my huge folder on the subject zipped. The material I have will probably be of use to you even for other tests. Screen Shot 2020-06-04 at 10 46 23 PM

mysticaltech avatar Jun 04 '20 20:06 mysticaltech

Thank you for your greatful jobs! I am one of those people hope Dunnett's test comparison. if it helps, please check probmc function in SAS.

Probmc fuction is returning a probability or a quantile from various distributions for multiple comparisons of means.

Tsuchihashi-ryo avatar Jul 20 '20 02:07 Tsuchihashi-ryo

I have looked at the code in pmvt R package and even tried to implement bindings to Fortran code, but failed. This is not trivial. No progress so far, unfortunately. Maybe SAS manual pages will be helpful...

maximtrp avatar Jan 18 '21 16:01 maximtrp

Hi Max Great package idea What is the current status for adding Dunnett's post hoc test? I am willing to help you building it, as I need this solution to my application

Ronny-PDT avatar Aug 09 '22 06:08 Ronny-PDT

@Ronny-PDT Thank you for your interest! Unfortunately, I do not have enough time to work with it now. You can try to implement it yourself and do a pull request

maximtrp avatar Aug 09 '22 09:08 maximtrp

I was able to make the Dunnett's test based on the SAS description above. However, I used the pingouin library (https://pingouin-stats.org/generated/pingouin.anova.html#pingouin.anova) to calculate ANOVA. Additionally, the run time is so long (25 seconds for three groups, n=6). Furthermore, I am a biological wet researcher, so I am not skilled in writing code. So please improve it and use it.

import numpy as np
from scipy.stats import norm
from scipy import integrate
from scipy.special import gamma
import pingouin as pg


def dunnett_two_side_unbalanced(data, dv, between,control):

  def dunnett_prob(T,v,lambda_params):

    def pdf(x):
      return norm.pdf(x)

    def cdf(x):
      return norm.cdf(x)

    def f(x,y):
      return pdf(y)*np.prod([cdf((lambda_i*y+T*x)/(np.sqrt(1-lambda_i**2)))-cdf((lambda_i*y-T*x)/(np.sqrt(1-lambda_i**2))) for lambda_i in lambda_params])

    def duv(x):
      return (v**(v/2)*x**(v-1)*np.exp(-v*x**2/2))/(gamma(v/2)*2**(v/2-1))

    def f_g(x,y):
      return f(x,y) * duv(x)

    return integrate.dblquad(f_g,-np.inf,np.inf,lambda x:0,lambda x:np.inf)[0]

  #First compute the ANOVA
  aov = pg.anova(dv=dv, data=data, between=between, detailed=True)
  v = aov.at[1, 'DF'] #自由度
  ng = aov.at[0, 'DF'] + 1 #全群数
  grp = data.groupby(between)[dv]
  n_sub = grp.count()
  control_index = n_sub.index.get_loc(control) #対照群のindexを取得
  n = n_sub.values #サンプル数
  gmeans = grp.mean().values#各平均
  gvar = aov.at[1, 'MS'] / n #各分散

  vs_g = np.delete(np.arange(ng),control_index) #対照群以外のindexを取得

  # Pairwise combinations(検定統計量Tを求める)
  mn = np.abs(gmeans[control_index]- gmeans[vs_g])
  se = np.sqrt(gvar[control_index] + gvar[vs_g]) #式は少し違うが等分散を仮定しているため
  tval = mn / se

  #lambdaを求める
  lambda_params = np.sqrt(n[vs_g]/(n[control_index]+n[vs_g]))

  pval = [1-dunnett_prob(t,v,lambda_params) for t in tval]

  stats = pd.DataFrame({
                      'A': [control]*len(vs_g),
                      'B': np.unique(data[between])[vs_g],
                      'mean(A)': np.round(gmeans[control_index], 3)*len(vs_g),
                      'mean(B)': np.round(gmeans[vs_g], 3),
                      'diff': np.round(mn, 3),
                      'se': np.round(se, 3),
                      'T': np.round(tval, 3),
                      'p-dunnett': pval
                      })
  return stats

I am Japanese, so forgive me for including Japanese annotations. And, I have prepared the article in Japanese, so please translate and scrutinize it if you want! (https://qiita.com/mosomoso_1910/items/1d28440d91f63829b6f9)

Tsuchihashi-ryo avatar Aug 10 '22 01:08 Tsuchihashi-ryo

@Tsuchihashi-ryo Thank you very much!

maximtrp avatar Aug 10 '22 05:08 maximtrp

@Tsuchihashi-ryo Wonderful!

mysticaltech avatar Aug 10 '22 08:08 mysticaltech