netatmo-api-python icon indicating copy to clipboard operation
netatmo-api-python copied to clipboard

get historical data from multiple public weather stations

Open mmattNN opened this issue 3 months ago • 5 comments

Hello, for academic research purposes, I need to retrieve some historical temperature data from several Netatmo weather stations that have made their data available, within an area of interest. Is this code suitable for this purpose, with a few adjustments? I have tried running some existing codes (patatmo) that make use of the GetPublicData and GetMeasure functions, but they no longer work, probably due to the change in the authentication process to Netatmo's API service. I'm stuck and I really appreciate any kind of help!

mmattNN avatar Apr 01 '24 12:04 mmattNN

You can use the library to take care of the authentication process and then access any Netatmo API endpoint using the rawAPI routine.

Just check an example at https://github.com/philippelt/netatmo-api-python/blob/master/samples/rawAPIsample.py

philippelt avatar Apr 01 '24 12:04 philippelt

Ok, thank you! Actually i am very new with python coding and API service, so I am again stuck . I think i managed to install the library lnetatmo and the ClientAuth is reading my credentials since running the first lines of this code [https://github.com/philippelt/netatmo-api-python/blob/master/samples/rawAPIsample.py] I see my credentials correctly (ID, secret, tokens, email), but i don't know how to proceed further in implementing a code to download historical public data from multiple stations. Should i have to try to run getpublicdata and getmeasure functions? Any kind of additional hints is more than welcomed, but thanks anyway.

mmattNN avatar Apr 01 '24 15:04 mmattNN

Yes, you should request the getpublicdata endpoint using rawAPI(authentication, "getpublicdata", parameters) where parameters is a dictionary as described in the API documentation https://dev.netatmo.com/apidocumentation/weather

philippelt avatar Apr 01 '24 16:04 philippelt

hello, again :) I am still trying to get historical data from multiple public stations. Following your suggestions and by several trial and error, i made some progresses and i succeeded in running a code to retrieve historical temperature data from weather stations but i have one last problem. I explain what i did. First, i retrieve device id and module id for all stations that are available in an area of interest at the moment, with the function getpublicdata. Then, i wanted to use the function getmeasure to retrieve one-year temperature data for each of them. I use the code i attached below (which is likely not the best solutions, but it seems working), that creates a loop to go through each station, downloading temperature data for one year, and also taking into account that i can not retrieve more than 1024 data for each request (if i understood correctly, this is a limitation of Netatamo API service, but the code i am using it seems to address the problem successfully). This code returns the data correctly, but only when i apply it for a small sample of weather stations (which in total are around 300). For example, if i apply it to 10 stations, it returns the data correctly, but if i try to take into account all the stations, i get an error. I think it is related to the limit in API requests, which are 500 per hour, if i am not wrong, and the code i am running uses more requests to retrieve all the data i am asking to download. How can i solve this problem in the code? Thanks!

`

Authentication setup

authorization = lnetatmo.ClientAuth()

def fetch_temperature_data(device_id, module_id, start_time, end_time): all_data = pd.DataFrame() current_start_time = start_time

while True:
    para_hist = {
        "device_id": device_id,
        "module_id": module_id,
        "scale": "1hour",
        "type": "temperature",
        "date_begin": current_start_time,
        "date_end": end_time,
        "optimize": "false",
        "real_time" : "true"
    }

    try:
        historical = lnetatmo.rawAPI(authorization, "getmeasure", parameters=para_hist)
        if not historical:
            break  # No more data to fetch
    except Exception as e:
        print(f"Failed to fetch data: {e}")
        break

    historical_df = pd.DataFrame(historical).melt(var_name='time', value_name='temperature')
    historical_df['time'] = pd.to_numeric(historical_df['time'])
    
    all_data = pd.concat([all_data, historical_df], ignore_index=True)
    
    # Check if it's the last batch
    if len(historical_df) < 1024:
        break
    else:
        last_time = historical_df['time'].iloc[-1]
        current_start_time = last_time + 3600

return all_data

Rest of your code to process and pivot the data as before

Define the time range

start_time = 1672527600 # domenica 1 gennaio 2023 00:00:00 GMT+01:00 end_time = 1704063600 # lunedì 1 gennaio 2024 00:00:00 GMT+01:00

all_data = pd.DataFrame()

Loop through each station and fetch data

for index, row in stations_AOI.iterrows(): module_id = next(iter(row['module_types'])) # Assuming the module_types is a dictionary station_data = fetch_temperature_data(row['_id'], module_id, start_time, end_time) station_data['device_id'] = row['_id'] station_data['module_id'] = module_id station_data['latitude'] = row['place']['location'][1] station_data['longitude'] = row['place']['location'][0] all_data = pd.concat([all_data, station_data], ignore_index=True)

Pivot the DataFrame

pivoted_data = all_data.pivot_table(index=['device_id', 'module_id', 'latitude', 'longitude'], columns='time', values='temperature', aggfunc='first').reset_index() `

mmattNN avatar Apr 14 '24 14:04 mmattNN

Well you already have done a very good job.

You should just do some throttling yourself to avoid reaching the API request limit.

The simplest solution is to add some wait time using time.sleep(x) (x float in seconds) between two station calls for example or n stations calls.

Yours

philippelt avatar Apr 16 '24 16:04 philippelt