openaq-fetch icon indicating copy to clipboard operation
openaq-fetch copied to clipboard

Obtaining historical data from CAAQM

Open QEDK opened this issue 6 years ago • 11 comments
trafficstars

Hey, I thought I'd drop a note: https://app.cpcbccr.com/caaqms/download?filename=site_(uniqueID/pattern).xlsx is useful for obtaining absolute station metrics (you can simply fetch all old info in a CSV and then use the data.gov.in real-time API to add on to that). The uniqueID can be obtained through an AJAX call in your browser dev console through a base64 encoded JSON request payload. I don't know how ethical is it, since it is public data anyway but using a one-time CSV is certainly less resource-intensive. It is useful for filling up the gaps in data as well per #585 .

QEDK avatar Feb 13 '19 15:02 QEDK

Thanks a bunch, @QEDK! Tagging @jflasher for his awareness too.

RocketD0g avatar Feb 13 '19 15:02 RocketD0g

@QEDK good morning.

Can you post an example, please.

urbanemissions avatar Feb 18 '19 05:02 urbanemissions

Just FYI.. data.gov.in real time API is only for air quality index

And what openaq and people like us are interested in is absolute air quality values.. which is posted on caaqm website.

If you can example for this https://app.cpcbccr.com/caaqms/download?filename=site_(uniqueID/pattern).xlsx that will be useful.

Main question - what is "pattern"

urbanemissions avatar Feb 18 '19 05:02 urbanemissions

Tried a few combinations..

https://app.cpcbccr.com/caaqms/download?filename=site_(1425/pattern).xlsx https://app.cpcbccr.com/caaqms/download?filename=site_1425.xlsx

urbanemissions avatar Feb 18 '19 05:02 urbanemissions

@urbanemissions Okay, I'll go step-by-step. The best place to get tabular data is https://app.cpcbccr.com/ccr/#/caaqm-dashboard-all/caaqm-landing/data Select your parameters and submit, the first thing you'll notice is that you are only getting 24 hours worth of data, this is not useful considering we already can get it from continuously accessing the API and elsewhere. This is because of an inbuilt limitation built into the form, making the client POST request only the last 24 hours. This is where you need a modified AJAX call. When you download a file on the tabular page you get, the client makes a POST to fetch the payload containing the file URL. This simply means you have to make an AJAX call with their parameters. Edit and resend the POST with the parameters of your liking and it should POST successfully (large csvs are slow, might cause 405 errors). The request payload is Base64 encoded so you will need additional work to decode and re-encode it. Here are some images which tell you what to do: https://ibb.co/7zSnrNP https://ibb.co/6Xyhxdh The unique ID is always in the format of site_10620190213203417 where the length of the number is same. There's probably a pattern (106 is probably station ID, 2019 is last year fetched maybe) but I don't know exactly how it works. You can access the same file here - https://app.cpcbccr.com/caaqms/download?filename=site_10620190213203417.xlsx to see that it indeed is working.

QEDK avatar Feb 18 '19 08:02 QEDK

This looks like a one time download link to a file made at the time of the request. site_IDYYYYMMDDHHMMSS.xlsx -- likely they are saving it for some time, which you are able to access - your request was made on 2019-02-13-20-34-17

If you change the ID number, it is a zero file.

urbanemissions avatar Feb 18 '19 14:02 urbanemissions

@urbanemissions That's probably it. Generating the download link is pretty plausible tho.

QEDK avatar Feb 18 '19 16:02 QEDK

updated download url. The site now uses a CAPTCHA so adding historical data would have to be done somewhat manually. If the API from #283 can be used for historical data that would be preferred.

majesticio avatar Jan 30 '23 18:01 majesticio

API from https://github.com/openaq/openaq-fetch/issues/283 is for data.gov.in -- which is AQI only. This is used for a couple of air quality apps in India.

urbanemissions avatar Jan 30 '23 18:01 urbanemissions

@urbanemissions do you know of an API for raw air quality data, rather than for AQI?

majesticio avatar Jan 30 '23 20:01 majesticio

If CPCB database access stopped because of a technical snag, it maybe worthwhile talking to one of these groups

https://ncaptracker.in/ https://www.airveda.com/ https://blueskyhq.io/products/bam-aq

There are groups outside India also doing the same (like iqair, etc) And none of these share openly what they are scrapping.. I understand that blueskyhq has a commercial API.

-- Dr. Sarath Guttikunda

http://www.urbanemissions.info http://www.urbanemissions.info

On Tue, Jan 31, 2023 at 1:48 AM Gabriel Fosse @.***> wrote:

@urbanemissions https://github.com/urbanemissions do you know of an API for raw air quality data, rather than for AQI?

— Reply to this email directly, view it on GitHub https://github.com/openaq/openaq-fetch/issues/593#issuecomment-1409276074, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD6CT2KTCQUXDJJBHVQVNCDWVAO23ANCNFSM4GXGOX2Q . You are receiving this because you were mentioned.Message ID: @.***>

urbanemissions avatar Jan 31 '23 03:01 urbanemissions