malariaAtlas icon indicating copy to clipboard operation
malariaAtlas copied to clipboard

More reproducible PR data retrieval

Open timcdlucas opened this issue 6 years ago • 0 comments

One aim for the package is for it to enable reproducible analyses.

If you write a script

d <- getPR()
lm(x ~ y, data = d)

and I run the script 6 months later, getting the same result is useful. But currently, if data has been added, we will get different reuslts.

Spoke to Joe and he said

" the code i wrote for importing new PR data creates a log table with timestamps correlated with the created IDs

so i think it should be a relatively simple matter to change the process Daniel uses to export PR data into the explorer to include a outer join to this log and provide the info the api function would need to filter and include stuff only as it was at a certain date (assuming all the dates are in the future, i think we have no ability to tell when PR data added before the log table existed were put in) "

So this should be possible. In future the syntax would be:

d <- getPR(asAt = '2018-02-02')
lm(x ~ y, data = d)

Which should be completely reproducible. Then to update the analysis you'd simply do

d <- getPR(asAt = '2018-06-07')
lm(x ~ y, data = d)

timcdlucas avatar Mar 01 '18 17:03 timcdlucas