pymc-examples icon indicating copy to clipboard operation
pymc-examples copied to clipboard

GLM poisson

Open OriolAbril opened this issue 3 years ago • 15 comments

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/generalized_linear_models/GLM-poisson-regression.ipynb Reviewers:

The sections below may still be pending. If so, the issue is still available, it simply doesn't have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this notebook to a "Best Practices" state. However, these are probably not enough! Make sure to thoroughly review the notebook and search for other updates.

General updates

  • Use numpy generator
  • :warning: code cells 15 and 19 are plain wrong, we are doing np.exp(np.mean()) instead of np.mean(np.exp()).

ArviZ related

  • code cell 15 (again) is computing the whole summary dataframe, when only a subset of the columns are needed. We should either use kind="stats" or customize summary, examples of both at: https://arviz-devs.github.io/arviz/api/generated/arviz.summary.html

Notes

Exotic dependencies

None

Computing requirements

Models sample in less than a minute

OriolAbril avatar Mar 30 '21 23:03 OriolAbril

Hi! I'd like to try working on this.

jessicakzhang avatar Apr 11 '21 03:04 jessicakzhang

That would be great @jessicakzhang! I have added a couple of suggestions based on a quick look over the notebook, I'll review more carefully once you submit a PR.

Let us know if you have any doubt while working on this

OriolAbril avatar Apr 11 '21 08:04 OriolAbril

Hi @jessicakzhang , are you still working on this issue? @OriolAbril would it be okay if I were to submit a PR for this issue, considering the fact that this issue has already been assigned?

chiral-carbon avatar May 04 '21 07:05 chiral-carbon

Hi, yes, as it has been more that two weeks with no activity, as indicated in the contributing guide, I'll assign the issue to you so you can submit a PR.

OriolAbril avatar May 04 '21 11:05 OriolAbril

thanks a lot!

chiral-carbon avatar May 04 '21 11:05 chiral-carbon

@OriolAbril had a doubt regarding updating np.exp(np.mean()) to np.mean(np.exp()) in cells 15 and 19. as far as I understood, az.summary returns certain statistics of which mean is one, and we are computing np.exp() for this summary dataframe. should this be changed to compute np.mean() of az.summary? I am unclear as to exactly where we should be applying np.exp() here.

also, while updating az.summary to use args rather than manually subsetting the dataframe, should we show all the default stats (mean, sd, hdi_3%, hdi_97%) for the variables or only a subset of mean, hdi_3%, hdi_97% as is being shown currently?

chiral-carbon avatar May 05 '21 08:05 chiral-carbon

I used mean as a placeholder, summary is acting as mean (as well as acting as hdi). Exponentiating should come first, then calling summary on the exponentiated data, not the other way around.

I think the default stats is good enough and it's simple, there is no need to overly complicate the notebook only to exclude sd from summary.

OriolAbril avatar May 05 '21 12:05 OriolAbril

oh okay, thanks for clarifying. so when I do the exponentiation on inf_fish which is an InferenceData object, I should first convert it to a data frame and exponentiate that data frame, and only then create a summary for it. have I understood this correctly? but az.summary takes an InferenceData object, so what would be the best way to exponentiate inf_fish here?

chiral-carbon avatar May 05 '21 13:05 chiral-carbon

You should exponentiate the posterior samples, which are a group in inferencedata, in the form of an xarray dataset. It should look something like: az.summary(np.exp(idata.posterior), ...)

OriolAbril avatar May 05 '21 14:05 OriolAbril

yes, that works, thanks a lot!

chiral-carbon avatar May 05 '21 14:05 chiral-carbon

in cell 19, I think there's a typing error. the code in cells 15 and 19 are identical and call a summary for the same variable inf_fish, which was generated using the manual model. cell 19 should be displaying the summary for the model results created with glm.from_formula and which are stored in the variable inf_fish_alt. can you confirm this?

also, should the data in the markdown cell after cell 15 be altered to match the new mean and hdi that we see in the summary? the values are only very slightly off Screenshot from 2021-05-05 20-58-23

chiral-carbon avatar May 05 '21 14:05 chiral-carbon

in cell 19, I think there's a typing error. the code in cells 15 and 19 are identical and call a summary for the same variable inf_fish, which was generated using the manual model. cell 19 should be displaying the summary for the model results created with glm.from_formula and which are stored in the variable inf_fish_alt. can you confirm this?

yes, it is definitely a typo.

also, should the data in the markdown cell after cell 15 be altered to match the new mean and hdi that we see in the summary?

I would update it to avoid confusing readers

OriolAbril avatar May 05 '21 20:05 OriolAbril

Needs to be updated to use bambi instead of glm module

OriolAbril avatar Jun 09 '21 13:06 OriolAbril

will be working on it @OriolAbril

chiral-carbon avatar Jun 14 '21 16:06 chiral-carbon

I'm about to update this to v4

drbenvincent avatar May 30 '22 16:05 drbenvincent