coronavirus-tracker-api Storing data in a data in database

Right now the data is just stored in cache. Is it perhaps better to sync the data to an actual MySQL database? It would allow for fast querying.

Mar 19 '20 21:03 ExpDev07

From what I have seen you guys talking about on other issues it seems like the format of the data is not totally stable. If that's the case wouldn't it be easier to use a NoSQL database over a RDBMS?

Mar 21 '20 22:03 Kilo59

Yes, that would probably be better. MongoDB or alike.

Mar 21 '20 22:03 ExpDev07

Happy to help with this as well.

Mar 22 '20 00:03 Kilo59

This would be good too because the API is now broken for me and others

Mar 25 '20 07:03 focus1691

@traderjosh can you explain in more detail how it’s broken for you? JHU (our data provider) made some pretty drastic changes lately which has caused the API’s outputs to change (of note the ID indexing and provinces no longer being present for USA).

Mar 25 '20 07:03 ExpDev07

@ExpDev07 it works now - it was that cors header field missing from the API. The recovered field is missing now it seems. Is that gone forever? And some countries don't have an id, you're right.

I think a database would be good for this in case of website changes. MongoDB would be good because the data can change unexpectedly.

Mar 25 '20 07:03 focus1691

For JHU, yes, the recovery stats is gone forever unless they decide to bring it back. I’m gonna see if I can find some other reputable sources that offer it and add it to the API.

I believe their reasoning was that no reputable sources were providing accurate recovery numbers, so they just decided to remove it.

Mar 25 '20 07:03 ExpDev07

Ok that's not an issue. Do you want me to help setup MongoDB? I can write the boilerplate and you can integrate an account for it.

Mar 25 '20 07:03 focus1691

It would be awesome if you can start drafting a PR for it. It needs to be compatible with our service provider system (see “app.services.locations” module). But I think MongoDB will be perfect for it. I’m thinking we periodically sync the DB with data retrieved from the data sources.

Mar 25 '20 07:03 ExpDev07

And it should be in Python? Not my speciality but I could do some research.

Mar 25 '20 07:03 focus1691

Yeah, feel like that would be best.

Mar 25 '20 07:03 ExpDev07

And it should be in Python? Not my speciality but I could do some research.

https://api.mongodb.com/python/current/

Mar 25 '20 11:03 Kilo59

Perhaps we should use Mongo to store and update the normalized data? We can keep the data in a format that is easy to translate into our various responses.

Collections for each source
Documents for the countries/locations
Background tasks to refresh the sources according to how frequently they are each updated.
Continue to use caching to minimize database reads.

Mar 29 '20 16:03 Kilo59

From what I have seen in the code so far, both the JHU and CSBS location models derive from the Location class, with CSBS having some additional fields. This kind of inheritance relationship should be easy to represent (and query) in an RDBMS.

Splitting the data in multiple collections in Mongo wouldn't really add much value unless you want to support historical records in multiple formats, which is horrible to query anyway if they aren't backwards compatible.

Mar 31 '20 00:03 cyenyxe

We are deployed with gunicorn that runs multiple worker processes (4) each with their own caches. @cyenyx once we are storing the data in any database (RDBMS, Mongo, Redis, etc) the workers can use it like a shared cache. Then they don't have to independently rebuild their own separate caches every hour. It also adds resiliency when one of the dependent services timeouts, or encounters some kind of error, which is often the problem that causes this API to go down.

coronavirus-tracker-api
coronavirus-tracker-api copied to clipboard

Storing data in a data in database

Related reading

coronavirus-tracker-api coronavirus-tracker-api copied to clipboard

Storing data in a data in database

Related reading

coronavirus-tracker-api
coronavirus-tracker-api copied to clipboard