gratia icon indicating copy to clipboard operation
gratia copied to clipboard

Comments on JOSS paper

Open dill opened this issue 7 months ago • 7 comments

(Part of review here.)

This was fun to read and I learnt some things :)

Some minor statistical/editorial comments on the paper:

  • The author name is shown as "¿citation_author?" in the pdf on the gratia github. Not sure if this is normal.
  • The formulation for the GAM (first display equation) doesn't allow for multivariate smooths. This is always a pain, and I've struggled to come up with a general form. In the end I used what Simon Wood does in his book and just give an example of what can be done than trying to write a general definition. My version of this is in Miller (2019), in case that's helpful. If you did adapt this bit, there would be some follow-on changes needed in the subsequent text (though I think just giving an example of a univariate spline in the following text is totally appropriate).
  • In addition you only talk about scale parameters as the other parameters of the distribution. That's fine for exponential family distributions, but for the extended exponential case (as you show with the twlss() model), there are more parameters to consider. You might not want to address this in the intro -- that's fine!
  • "which, with the default penalty, is the integrated squared second derivative" -- I don't think this is quite right. Do you mean the default basis-penalty combination, thin plate regression splines? (As you go on to talk about.)
  • Is Figure 1a the basis functions or the basis functions multiplied by their corresponding coefficients? I think it's the latter?
  • It would be nice to have a legend in Figure 1 so readers can relate the basis functions to the values in the penalty matrix (and e.g., see that the most wiggly function has the highest penalty)? Open to push-back on this, if you want to just give default plots from gratia.
  • "GAMs fitted by mgcv are an empirical Bayesian model with an improper multivariate normal prior on the basis function coeficients." Is that always the case? One might argue that the intercept is always in the nullspace of every penalty, but one could remove that from the model and have a proper MVN prior? Could just put the improper in brackets with the word "usually"?
  • In previous snippet, coefficients, not "coeficients"
  • Miller (2019) should be Miller (2021) -- I updated it on arXiv.
  • "The response is assumed to be conditionally distributed Tweedie" -> "The response is assumed to be conditionally Tweedie distributed"?
  • "Model coefficients and smoothing parameters are estimated using restricted maximum likelihood (Wood, 2011)" missing period at end of sentence?
  • Explanation of ctrl <- gam.control(nthreads = 10)?
  • "which show significant heteroscedasticity and departure from the condtional distribution of the response given the model" -- could highlight that we see this due to the increasing spread in the deviance residuals wrt values of the linear predictor in the top right plot?
  • In the previous snippet "condtional" -> "conditional"
  • "given the absence of important effects in the model" not sure what this means?
  • In the Tweedie LSS model lead-in, it looks like you're using \varphi rather than \phi for the scale parameter (as you did in the intro), not sure if this was intentional?
  • "little data wrangling, we can produce an uncertaintry estimate ,using fitted_samples() to" misplaces comma.
  • In the references:
    • Not sure how the JOSS wants you to style "brms" but "Brms" looks weird.
    • "Duchon, J. (1977). Splines minimizing rotation-invariant semi-norms in sobolev spaces" Sobolev should be capitalised.
    • "Wood, S. N. (2003). Thin plate regression splines: Thin plate regression splines." I think it just has one title? ;)

dill avatar Jul 15 '24 07:07 dill