scikit-posthocs Dunnetts test

Thanks for this package! One useful addition (unless I am missing it) would be Dunnett's test comparison to a control.

Jun 07 '19 09:06 comatrion

@comatrion Thank you for suggestion! I will implement it soon.

Jun 16 '19 19:06 maximtrp

Implementing Dunnett would be fantastic, please don't hesitate to ping me, I have Matlab and FORTRAN code(that I also converted to C++) implementing it, I also have access to all the major books and papers on the subjects like the one from HSU. Needless to say, there are many R packages implementing it too but those are GPL licensed. I am willing to share all my resources to help speed up implementation, please let me know at [email protected].

Jun 03 '20 12:06 mysticaltech

@mysticaltech This test relies upon multivariate Student's t-distribution function (pmvt in R). AFAIR, I could not find a usable implementation of it in Python a year ago. It would be great, if you helped me with it. I hope there is one somewhere on the web :smile:

Jun 04 '20 15:06 maximtrp

Maxim, I have detailed papers explaining it, and clear FORTRAN code (from the famous book "Multiple Comparison" by Jason Hsu, with a detailed chapter about it) and Matlab code implementing it, I also was able to transpile the FORTRAN code to C++. Quite readable! Please send a quick hello to the email above and I will reply with my huge folder on the subject zipped. The material I have will probably be of use to you even for other tests. Screen Shot 2020-06-04 at 10 46 23 PM

Jun 04 '20 20:06 mysticaltech

Thank you for your greatful jobs! I am one of those people hope Dunnett's test comparison. if it helps, please check probmc function in SAS.

Probmc fuction is returning a probability or a quantile from various distributions for multiple comparisons of means.

Jul 20 '20 02:07 Tsuchihashi-ryo

I have looked at the code in pmvt R package and even tried to implement bindings to Fortran code, but failed. This is not trivial. No progress so far, unfortunately. Maybe SAS manual pages will be helpful...

Jan 18 '21 16:01 maximtrp

Hi Max Great package idea What is the current status for adding Dunnett's post hoc test? I am willing to help you building it, as I need this solution to my application

Aug 09 '22 06:08 Ronny-PDT

@Ronny-PDT Thank you for your interest! Unfortunately, I do not have enough time to work with it now. You can try to implement it yourself and do a pull request

Aug 09 '22 09:08 maximtrp

I was able to make the Dunnett's test based on the SAS description above. However, I used the pingouin library (https://pingouin-stats.org/generated/pingouin.anova.html#pingouin.anova) to calculate ANOVA. Additionally, the run time is so long (25 seconds for three groups, n=6). Furthermore, I am a biological wet researcher, so I am not skilled in writing code. So please improve it and use it.

import numpy as np
from scipy.stats import norm
from scipy import integrate
from scipy.special import gamma
import pingouin as pg


def dunnett_two_side_unbalanced(data, dv, between,control):

  def dunnett_prob(T,v,lambda_params):

    def pdf(x):
      return norm.pdf(x)

    def cdf(x):
      return norm.cdf(x)

    def f(x,y):
      return pdf(y)*np.prod([cdf((lambda_i*y+T*x)/(np.sqrt(1-lambda_i**2)))-cdf((lambda_i*y-T*x)/(np.sqrt(1-lambda_i**2))) for lambda_i in lambda_params])

    def duv(x):
      return (v**(v/2)*x**(v-1)*np.exp(-v*x**2/2))/(gamma(v/2)*2**(v/2-1))

    def f_g(x,y):
      return f(x,y) * duv(x)

    return integrate.dblquad(f_g,-np.inf,np.inf,lambda x:0,lambda x:np.inf)[0]

  #First compute the ANOVA
  aov = pg.anova(dv=dv, data=data, between=between, detailed=True)
  v = aov.at[1, 'DF'] #自由度
  ng = aov.at[0, 'DF'] + 1 #全群数
  grp = data.groupby(between)[dv]
  n_sub = grp.count()
  control_index = n_sub.index.get_loc(control) #対照群のindexを取得
  n = n_sub.values #サンプル数
  gmeans = grp.mean().values#各平均
  gvar = aov.at[1, 'MS'] / n #各分散

  vs_g = np.delete(np.arange(ng),control_index) #対照群以外のindexを取得

  # Pairwise combinations（検定統計量Ｔを求める）
  mn = np.abs(gmeans[control_index]- gmeans[vs_g])
  se = np.sqrt(gvar[control_index] + gvar[vs_g]) #式は少し違うが等分散を仮定しているため
  tval = mn / se

  #lambdaを求める
  lambda_params = np.sqrt(n[vs_g]/(n[control_index]+n[vs_g]))

  pval = [1-dunnett_prob(t,v,lambda_params) for t in tval]

  stats = pd.DataFrame({
                      'A': [control]*len(vs_g),
                      'B': np.unique(data[between])[vs_g],
                      'mean(A)': np.round(gmeans[control_index], 3)*len(vs_g),
                      'mean(B)': np.round(gmeans[vs_g], 3),
                      'diff': np.round(mn, 3),
                      'se': np.round(se, 3),
                      'T': np.round(tval, 3),
                      'p-dunnett': pval
                      })
  return stats

I am Japanese, so forgive me for including Japanese annotations. And, I have prepared the article in Japanese, so please translate and scrutinize it if you want! (https://qiita.com/mosomoso_1910/items/1d28440d91f63829b6f9)

Aug 10 '22 01:08 Tsuchihashi-ryo

@Tsuchihashi-ryo Thank you very much!

Aug 10 '22 05:08 maximtrp

@Tsuchihashi-ryo Wonderful!

Aug 10 '22 08:08 mysticaltech

scikit-posthocs scikit-posthocs copied to clipboard

Dunnetts test

scikit-posthocs
scikit-posthocs copied to clipboard