openaq-fetch
openaq-fetch copied to clipboard
Adapters/israel update
Suggestions on improvements welcome.
Seems to work for me: 2018-08-21T19:58:00.113Z - info: Connected to database. 2018-08-21T19:58:00.192Z - info: Database migrations are handled, ready to roll! 2018-08-21T19:58:00.192Z - info: Running all fetch tasks. 2018-08-21T20:01:44.071Z - info: All data grabbed and saved. 2018-08-21T20:01:44.083Z - info: /////// 2018-08-21T20:01:44.083Z - info: New measurements inserted for Israel: 78890 2018-08-21T20:01:44.083Z - info: /////// 2018-08-21T20:01:44.107Z - info: Fetches table successfully updated 2018-08-21T20:01:44.117Z - info: Sources table successfully updated.
Thanks for this @sval-dev! I won't be able to look at it for a little while due to travel, but glad to see this being worked on. @michalcz based on the number of measurements inserted in the dry run above, this would likely be a good one to measure footprint of as well. Thanks everyone!
Apprecate you checking in Joe!
Feel free to use this for memory measurements, but I don't think the daily URL is what we'll want to hit in the end.
We'll likely want stations/
Hi @sval-dev, just getting back from some travel, sorry for the delay. So I haven't looked into what the source provides, but if we can get the data we need from the daily endpoint, I wonder if that might be better than trying to hit all the individual station links? My thinking here is that while the daily link might be more data transferred at once, it'd be less overall requests. I've looked at doing individual station requests for some other sources, and the number of requests your sending to the source can scale up pretty quickly, depending on the number of stations. Want to make sure we're not running into any rate limits and are being good API users. However, if the daily endpoint isn't giving us what we need, then obviously that's not an option.
Also, the current envista adatper is shared by this source and https://github.com/openaq/openaq-fetch/blob/b9ce7363f77f285c5e73080da6ac63506d5fe73d/sources/za.json. Do you think we could reuse this new adapter for ZA as well?
Welcome back @jflasher , hope you enjoyed the travel!
I'm not sure there is any /daily endpoint that will return actual data for more than one station as the daily endpoints I see in the docs all have the form of .../stations/:id/.../daily where :id is the station ID. In addition, using the daily endpoint would mean that we build up duplicates throughout the day and that we'll lose any data points that occur after our last fetch within a particular day. As an example, if we are querying the source every 15 minutes starting at 00:05, then all of the entries that would be fetched from 00:05 - 23:35 would be duplicated by the fetch at 23:50, and we would be unable to retrieve any measurements that occured from 23:50 - 23:59 (since our next /daily fetch at 00:05 would only grab info from 00:00 forward).
However, the /latest endpoint at the region level (there are 13 regions) might be able to save us the trouble of querying each station if we were okay with /latest instead of /daily (note the daily endpoint doesn't exist at the region level). Do you think its worth making the switch?
Separately, I didn't have success in using the hoursBack parameter with /latest and the API appears to ignore the parameter entirely. I was hoping to use hoursBack to return more than one measurement to make sure we don't miss any even if we are occasionally unsuccessful in our fetch. Do you happen to have any other source for the information from the "List of available commands" link in issue #483 or any other idea why hoursBack might not be working as described?
Lastly, I would assume that if the ZA source makes available the same API then the adapter would work fine for it too, but since the original envista adapter is doing scraping of HTML and not using an API and I didn't find an API URL for the ZA source in my cursory search I'm not sure how to apply it.
I will be closing this PR soon because the source url pages are no longer active. I am open to any new information or insights!
The URL for the Israel source is now https://api.svivaaqm.net/v1/envista/
We are actively using a version of this adapter successfully on our side but we've diverged enough that it may not be a clean merge back in and I'm not currently authorized to spend time upstreaming our changes (though we very much hope to do this).
As another enhancement, in addition to targeting "latest" is also possible to switch from the .../data/daily endpoint to .../data/?from=...&to=... (with format YYYY-MM-DDTHH:mm:ss) to fetch a particular data range.
Lastly, I'll mention that -9999 is used as fill value for this source, and those inputs likely need to be filtered rather than put into the measurements table.
Hope it helps!
We really appreciate you keeping us in mind, when I followed https://api.svivaaqm.net/v1/envista/ I get a 405 error... maybe you can share the endpoints you are using and how you got the authorization?
Sounds like you may not be setting the right API token or auth headers.
Information/auth was reverse engineered from https://svivaaqm.net/ IIRC.
An example request: curl 'https://api.svivaaqm.net/v1/envista/regions/4/index/latest?hoursBack=24' -H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0' -H 'Accept: application/json' -H 'Accept-Language: en-US,en;q=0.5' -H 'Accept-Encoding: gzip, deflate, br' -H 'Content-Type: application/json; charset=utf-8' -H 'Authorization: ApiToken 1cab20bf-0248-493d-aedc-27aa94445d15' -H 'envi-data-source: MANA' -H 'Access-Control-Allow-Credentials: true'
More information is available in the original issue #483
thanks again @sval-dev we really appreciate all the info and help