TuringDataStories
TuringDataStories copied to clipboard
[Review][Turing Data Story] Modelling Hospital Admissions in the UK
Story Review:
Story Name: Modelling Hospital Admissions in the UK
Submitting Author: @edaub
Pull Request: #150
Reviewers: @helendduncan, @ChristinaLast and @lukehare
Editor: @crangelsmith
Reviewer instructions & questions
@helendduncan, @ChristinaLast and @lukehare , please carry out your review in this issue by updating the checklist below in a new comment (copy and paste it), and writing new comments in this issue or the story pull request in case you have any questions. If you cannot edit the checklist please:
- Make sure you're logged in to your GitHub account
- Be sure to accept the invite at this URL: https://github.com/alan-turing-institute/TuringDataStories/settings/access
Any questions, concerns or suggestions regarding the review process please let @crangelsmith, @DavidBeavan or @samvanstroud know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
THANK YOU!
Review Checklist
Code of conduct
- [ ] I confirm that I read and will adhere to the Turing Data Stories code of conduct.
General checks
- [ ] Notebook: Is the source code for this data story available as a notebook in the linked pull request?
- [ ] Contribution and authorship: Are the authors clearly listed? Does the author list seem appropriate and complete?
- [ ] Scope and eligibility: Does the submission contain an original and complete analysis of open data? Is the story aligned with the Turing Data Stories vision statement?
Reproducibility
- [ ] Does the notebook run in a local environment?
- [ ] Does the notebook build and run in binder?
- [ ] Are all data sources openly accessible and properly cited with a link?
- [ ] Are the data open, and do they have an explicit licence, provenance and attribution?
Pedagogy
- [ ] Does the story demonstrate some specific data analysis or visualisation techniques?
- [ ] Are these techniques well motivated?
- [ ] Are these techniques well implemented?
- [ ] Is the notebook well documented, using both markdown cells and comments in code cells?
- [ ] Does the notebook has a introduction section motivating the story?
- [ ] Does the notebook has a conclusion section discussing the main insight from the stories?
- [ ] Is the paper well written (it does not require editing for structure, language, or writing quality)?
Context
- [ ] Does the story give an insight into some societal issue?
- [ ] Is the context around this issue well referenced (newspaper articles, scientific papers, etc.)?
Ethical
- [ ] Is any linkage of datasets in the story unlikely to lead to an increased risk of the personal identification of individuals?
- [ ] Is the Story truthful and clear about any limitations of the analysis (and potential biases in data)?
- [ ] Is the Story unlikely to lead to negative social outcomes, such as (but not limited to) increasing discrimination or injustice?
AOB
There are some extra checklists to consider in the PR #150
Review Checklist
Code of conduct
- [x] I confirm that I read and will adhere to the Turing Data Stories code of conduct.
General checks
- [x] Notebook: Is the source code for this data story available as a notebook in the linked pull request?
- [x] Contribution and authorship: Are the authors clearly listed? Does the author list seem appropriate and complete?
- [x] Scope and eligibility: Does the submission contain an original and complete analysis of open data? Is the story aligned with the Turing Data Stories vision statement?
Reproducibility
- [x] Does the notebook run in a local environment?
- [ ] Does the notebook build and run in binder?
- [x] Are all data sources openly accessible and properly cited with a link?
- [ ] Are the data open, and do they have an explicit licence, provenance and attribution?
Pedagogy
- [x] Does the story demonstrate some specific data analysis or visualisation techniques?
- [x] Are these techniques well motivated?
- [x] Are these techniques well implemented?
- [x] Is the notebook well documented, using both markdown cells and comments in code cells?
- [x] Does the notebook has a introduction section motivating the story?
- [x] Does the notebook has a conclusion section discussing the main insight from the stories?
- [x] Is the paper well written (it does not require editing for structure, language, or writing quality)?
Context
- [x] Does the story give an insight into some societal issue?
- [x] Is the context around this issue well referenced (newspaper articles, scientific papers, etc.)?
Ethical
- [x] Is any linkage of datasets in the story unlikely to lead to an increased risk of the personal identification of individuals?
- [x] Is the Story truthful and clear about any limitations of the analysis (and potential biases in data)?
- [x] Is the Story unlikely to lead to negative social outcomes, such as (but not limited to) increasing discrimination or injustice?
AOB
There are some extra checklists to consider in the PR #150
Section = Data
- [ ] Are the dates of the fall second lock-down correct? In the story they appear to be 2021 and a comment is made about widespread testing not being available but I think it was from then? This could well be me mis-reading it. Similarly with the cut-off comment being December 2021?
Running calculations locally
- [ ] When running the calculations locally, some sections took 10-12 minutes to complete, and my laptop sounded. like it was working hard. I assume this is expected behaviour. Note left in notebook
Review Checklist
Code of conduct
- [X] I confirm that I read and will adhere to the Turing Data Stories code of conduct.
General checks
- [X] Notebook: Is the source code for this data story available as a notebook in the linked pull request?
- [X] Contribution and authorship: Are the authors clearly listed? Does the author list seem appropriate and complete?
- [ ] Scope and eligibility: Does the submission contain an original and complete analysis of open data? Is the story aligned with the Turing Data Stories vision statement?
Reproducibility
- [ ] Does the notebook run in a local environment?
- Notebook runs but fails when running SMC on hospital admissions. Error:
ImportError: Version check of the existing lazylinker compiled file. Looking for version 0.211, but found None. Extra debug information: force_compile=False, _need_reload=True
- [ ] Does the notebook build and run in binder?
- Notebook does not run in Binder. Error:
The command '/bin/sh -c TIMEFORMAT='time: %3R' bash -c 'time mamba env update -p ${NB_PYTHON_PREFIX} -f "binder/environment.yml" && time mamba clean --all -f -y && mamba list -p ${NB_PYTHON_PREFIX} '' returned a non-zero code: 1
- [X] Are all data sources openly accessible and properly cited with a link?
- [X] Are the data open, and do they have an explicit licence, provenance and attribution?
Pedagogy
- [X] Does the story demonstrate some specific data analysis or visualisation techniques?
- [X] Are these techniques well motivated?
- [X] Are these techniques well implemented?
- [X] Is the notebook well documented, using both markdown cells and comments in code cells?
- [X] Does the notebook has a introduction section motivating the story?
- [X] Does the notebook has a conclusion section discussing the main insight from the stories?
- [X] Is the paper well written (it does not require editing for structure, language, or writing quality)?
Context
- [X] Does the story give an insight into some societal issue?
- [X] Is the context around this issue well referenced (newspaper articles, scientific papers, etc.)?
Ethical
- [X] Is any linkage of datasets in the story unlikely to lead to an increased risk of the personal identification of individuals?
- [X] Is the Story truthful and clear about any limitations of the analysis (and potential biases in data)?
- [X] Is the Story unlikely to lead to negative social outcomes, such as (but not limited to) increasing discrimination or injustice?
AOB
There are some extra checklists to consider in the PR #150
Section = Data
- [ ] Are the dates of the fall second lock-down correct? In the story they appear to be 2021 and a comment is made about widespread testing not being available but I think it was from then? This could well be me mis-reading it. Similarly with the cut-off comment being December 2021?
- the second national lockdown came into force on the 5th November and the third national lockdown is announced on 6 January, using this resource.
Comments
"This process follows a negative binomial distribution, shifted by k days. An equivalent way to view this is to say that the date of hospital admission follows a multinomial distribution, with the multinomial probabilities following the PMF of a negative binomial distribution, shifted by k days."
- It may be worth adding a non-technical explanatory sentence to this, such as:
Suppose a COVID-19 patient might be infected for a number of days until they are hospitalised. After being infected, there is a probability p
of their condition worsening. Then after ψ
days time, they will stop getting worse due to the development of antibodies and recover, or will be unable to recover and die from the disease. k
is the number of days on which they are not hospitalised before they recover/die.
"We might also fit three different
ψ
values for 0, 1, or 2 vaccine doses, though since vaccinated people are less likely than unvaccinated people to become infected, we would also need to estimate the chances of infection which is probably more difficult"
- There is also the fact that it is not just the probability of infection given a vaccine but the reduction in the probability of infection given the vaccination status of those around you.
Code of conduct
- [X] I confirm that I read and will adhere to the Turing Data Stories code of conduct.
General checks
- [X] Notebook: Is the source code for this data story available as a notebook in the linked pull request?
- [X] Contribution and authorship: Are the authors clearly listed? Does the author list seem appropriate and complete?
- [X] Scope and eligibility: Does the submission contain an original and complete analysis of open data? Is the story aligned with the Turing Data Stories vision statement?
Reproducibility
- [ ] Does the notebook run in a local environment?
- I got the same error as Christina above, i.e.,
ImportError: Version check of the existing lazylinker compiled file...
Apparently, this is to do with the conda installation oftheano
/pymc3
. I got around this locally by running:
- I got the same error as Christina above, i.e.,
conda uninstall pymc3
pip install pymc3
Related: #160
- [X] Does the notebook build and run in binder?
- [X] Are all data sources openly accessible and properly cited with a link?
- Is it likely that this API will be deprecated in the future? Would it be useful to save a static csv somewhere with the data required to reproduce this story?
- [X] Are the data open, and do they have an explicit licence, provenance and attribution?
Pedagogy
- [X] Does the story demonstrate some specific data analysis or visualisation techniques?
- [X] Are these techniques well motivated?
- [X] Are these techniques well implemented?
- [X] Is the notebook well documented, using both markdown cells and comments in code cells?
- [X] Does the notebook has a introduction section motivating the story?
- [X] Does the notebook has a conclusion section discussing the main insight from the stories?
- [X] Is the paper well written (it does not require editing for structure, language, or writing quality)?
Context
- [X] Does the story give an insight into some societal issue?
- [X] Is the context around this issue well referenced (newspaper articles, scientific papers, etc.)?
Ethical
- [X] Is any linkage of datasets in the story unlikely to lead to an increased risk of the personal identification of individuals?
- [X] Is the Story truthful and clear about any limitations of the analysis (and potential biases in data)?
- [X] Is the Story unlikely to lead to negative social outcomes, such as (but not limited to) increasing discrimination or injustice?
AOB
- As mentioned by Helen and Christina, I believe there are typos in the introduction section, and several references to 2021 ("July", "May", "31 December") should refer to 2020.
- The notebook appears to have been duplicated, i.e., the entire notebook is repeated (following the discussion section, the introduction reappears, followed by the authors, ...). This is probably a result of awkward notebook/git interaction. Currently leads to some
Non-unique cell id
errors, but these are easily ignored.