openaq-fetch icon indicating copy to clipboard operation
openaq-fetch copied to clipboard

Delhi Pollution Control Committee: National Capital Territory (NCT)

Open majesticio opened this issue 11 months ago • 6 comments

This adapter adds 24 stations from Delhi (NCT). Data is retrieved using puppeteer, and can take anywhere from 1 to 3.5 minutes to fetch the data. A deployment has been added so this adapter will run on its own and not with the realtime deployment. This adapter fetches these parameters from each station: co, no2, o3 , so2, pm25 in µg/m³, and relative humidity in %. Stations are fetched dynamically.

majesticio avatar Mar 06 '24 17:03 majesticio

results:
⛔⛔⛔⛔⛔⛔
The following packages are missing:
- puppeteer is needed by /home/runner/work/openaq-fetch/openaq-fetch/src/adapters/delhi.js

github-actions[bot] avatar Mar 06 '24 17:03 github-actions[bot]

I have updated the adapter to use the latest version of puppeteer as requested. The code now reflects recent changes to the puppeteer API.

majesticio avatar Mar 19 '24 20:03 majesticio

Good catch, it should now have "puppeteer": "^22.5.0"

majesticio avatar Mar 19 '24 23:03 majesticio

My two cents without looking at the code, but since i did the initial research on this source: Puppeteer seems like a really heavyweight solution for this source and is likely the source of the slowness. Since the data are availalbe inline in the page, there shouldnt be any need to use a headless browser and wait for all the JS and rendering. Simply fetching the page and finding the data arrays in the page content e.g.

}, {*/
  name: 'Ammonia',
   data: [84.6,88.9,104.7,100.2,97.5,100.7,99.6,94.9,83.6,61.1,53.0,42.5,37.6,34.2,36.3,45.0,55.0,54.0,52.6,59.9,59.2,62.1,58.4]
   }]


With a lightweight parser will likely be much faster and simpler.

russbiggs avatar Mar 20 '24 03:03 russbiggs

I updated the package-lock, I am hoping some mismatch was causing issues as it seems to run fine on my end. You have both noted this might not be the best way to fetch from this provider and I will review it going forward. @caparker Highcharts is defined within the script with puppeteer and should resolve itself if the version is correct... ie if puppeteer is not working properly it won't find / define Highcharts

majesticio avatar Mar 22 '24 22:03 majesticio

I did some refactoring and after playing with it for a while I think I would agree that we dont need puppeteer.

For example, this url https://www.dpccairdata.com/dpccairdata/display/AallAdvanceSearchCconc.php?stName=TWFuZGlybWFyZw== with this payload (and you could probably through the station name in there as well) parameters: NH3 fDate: 2024-03-24 15:01 eDate: 2024-03-25 15:01 duration: 1 graphType: Line submit: Search

Will return text that includes this

        $('#container').highcharts({
            chart: {
                type: 'line'
            },
            title: {
                text: ''
            },
            subtitle: {
                text: ''
            },
            xAxis: {
                categories: ['2024-03-24 19:00:00','2024-03-24 23:00:00','2024-03-25 01:00:00','2024-03-25 04:00:00','2024-03-25 05:00:00','2024-03-25 09:00:00','2024-03-25 11:00:00','2024-03-25 12:00:00','2024-03-25 14:00:00','2024-03-25 15:00:00'],
				labels: {
					rotation: 90
				}
            },
            yAxis: {
                title: {
                    text: ''
                }
            },
            tooltip: {
                enabled: true,
                formatter: function() {
                    return '<b>'+ this.series.name +'</b><br/>Date & Time : '+
                        this.x +'<br/><br/>Value : '+ this.y +'';
                }
            },
            plotOptions: {
                line: {
                    dataLabels: {
                        enabled: true,
                        style: {
                            textShadow: '0 0 3px white, 0 0 3px white'
                        }
                    },
                    enableMouseTracking: false
                }
            },
            series: [{
                /*name: 'Tokyo',
                data: [7.0, 6.9, 9.5, 14.5, 18.4, 21.5, 25.2, 26.5, 23.3, 18.3, 13.9, 9.6]
            }, {*/
                name: 'Ammonia',
                data: [41.2,40.5,41.3,39.2,39.3,41.2,41.8,42.4,41.7,41.7]               
            }]
        });
    });

And if we can extract that json part it would actually be safer because of the timestamps that are included in the object.

caparker avatar Mar 25 '24 22:03 caparker

The previous commit 613595b0b2a8d6183fe86ed20968e66451b52201 was returning more data for me. This was functioning on my end, however the adapter is slow and concerns have been expressed about its implementation. Closing, but may be of use for someone who can do it better.

majesticio avatar Apr 16 '24 23:04 majesticio