hydropandas icon indicating copy to clipboard operation
hydropandas copied to clipboard

reading the BRO data from XML files

Open dbrakenhoff opened this issue 2 years ago • 7 comments
trafficstars

Separate the reading of XML files from the API request so users can read manually downloaded XML files.

Discussed in https://github.com/ArtesiaWater/hydropandas/discussions/104

Originally posted by rt84ro March 2, 2023 Hi every one, I have downloaded some wells from the BROloket website but their format is .xml. I want to read them using the HydroPandas but actually I do not know how I should open these files. could you please let me know how to read them?

dbrakenhoff avatar Mar 02 '23 16:03 dbrakenhoff

I would like to implement this function. My starting point is to add an elif-statement to from_bro, like code below. And a new function get_obs_list_from_file. Do you agree?

if bro_id is None and (extent is not None):
            obs_list = get_obs_list_from_extent(
                extent,
                [..]
            )
            meta = {}
elif bro_id is not None:
            obs_list, meta = get_obs_list_from_gmn(
                [..]
            )
            name = meta.pop("name")
elif (extent is None) and (bro_id is None) and (fn is not None):
            obs_list = get_obs_list_from_file(
                fn,
            )
            meta = {}
else:
            raise ValueError("specify bro_id or extent")

HMEUW avatar Mar 15 '23 12:03 HMEUW

Yes, seems logical to me. And in io_bro the code to parse the XML file would then be separated and called in each of the get_obs_list_from_* methods?

Thanks for picking this up!

dbrakenhoff avatar Mar 15 '23 14:03 dbrakenhoff

Yesterday I started with this issue. I think we need two extra variables to implement this issue: origin and local_path .

My personal issue is added in the last line of the table.

User case Comment value origin value local_path
Get data from Broloket.nl, within extent Like current behaviour of function internet None
Get data from Broloket.nl, within extent, and save Bro XMLs for future use Like above, but save downloaded data internet path where zip will be created
Use downloaded data from Broloket.nl, read all data User has downloaded data available, via manual download or case in row above local path to zip
Local GMW files that have to be uploaded to bronhouderportaal-bro, read all data To check data in these files before submission local_bronhouder path to zip

I added bronhouder to origin in the last use case, because these files have some minor changes compared to broloket.nl-files. E.g. the BROid is not available, because the file has not yet been sent to broloket.nl. The easiest way is to use a separate value.

What do you think about this approach @dbrakenhoff?

HMEUW avatar Apr 05 '23 06:04 HMEUW

Thanks for the clear overview!

I think we should create two routes for getting data from the BRO, one API (internet) route and one local file route. So both ObsCollection and GroundwaterObs should get a from_bro() method for downloading data from the internet through the BRO API. I like your suggestion for storing this downloaded data, so these methods should accept some sort of directory or filename for storing the downloaded data.

Then I would suggest a separate route for reading the local files, from_bro_file/dir/local(), not sure what the name should be yet, but something along those lines. These methods accept a a directory/zip (in the case of ObsCollection), or a filename (in the case of Observation).

I think separating these two is probably clearest and makes the code less complicated.

Then bro.py should contain something like the following functions:

  • read_bro_xml() --> reads single BRO XML file
  • read_bro_dir() --> reads directory or zipfile with one or multiple XML files, basically calls the read_bro_xml() method in a loop.
  • replace the XML parsing in the current API functions with the read functions listed above.

@HMEUW, let me know what you think about this?

PS. I realize we're probably not very consistent across data sources in how we expose local vs API routes, but we shold probably address that in a separate issue. At this moment I'd vote for separating the two routes for each data source.

dbrakenhoff avatar Apr 14 '23 08:04 dbrakenhoff

I just completed first version to read XMLs for newly construced wells. These are submitted to https://www.bronhouderportaal-bro.nl. @OnnoEbbens please have a look for this code. The full_meta function is not yet working. Code is in the 'add-bronh'-branch.

HMEUW avatar Jun 13 '23 10:06 HMEUW

Started reading local BROloket files in the branch import-broloket-from file. Cannot make a direct link here.

HMEUW avatar Jul 07 '23 11:07 HMEUW

Just comitted my work. I have holiday after this week. I cannot work on it before, and expect I have the two after my holiday no time either.

If someone else want to pickup in July or August. It is okay.

HMEUW avatar Jul 17 '23 17:07 HMEUW