searvey icon indicating copy to clipboard operation
searvey copied to clipboard

Added NDBC as a data source

Open abdu558 opened this issue 1 year ago • 24 comments

Added NDBC

abdu558 avatar Jun 17 '24 13:06 abdu558

@abdu558 please go over failed tests and make sure we pass the basic tests. Make sure the code works for the oldest supported Python (in this case 3.9)

SorooshMani-NOAA avatar Jun 17 '24 14:06 SorooshMani-NOAA

hi Abdu and welcome :)

The CI is failing due to mypy. Chances are that you haven't set up your dev environment properly. Check here: https://github.com/oceanmodeling/searvey?tab=readme-ov-file#development (i.e. run make init which will ensure that the pre-commit hooks are installed). After that you should fix the mypy errors. To make a long story short, try to avoid Optional. We use | None instead. Check here for more context: https://stackoverflow.com/questions/69440494/python-3-10-optionaltype-or-type-none

pmav99 avatar Jun 17 '24 14:06 pmav99

Please convert to a draft PR

pmav99 avatar Jun 17 '24 19:06 pmav99

hi Abdu and welcome :)

The CI is failing due to mypy. Chances are that you haven't set up your dev environment properly. Check here: https://github.com/oceanmodeling/searvey?tab=readme-ov-file#development (i.e. run make init which will ensure that the pre-commit hooks are installed). After that you should fix the mypy errors. To make a long story short, try to avoid Optional. We use | None instead. Check here for more context: https://stackoverflow.com/questions/69440494/python-3-10-optionaltype-or-type-none

Okay thank you, do you know how to fix these two below? poetry export main.......................................................Failed

  • hook id: poetry-export
  • files were modified by this hook

poetry export dev........................................................Failed

  • hook id: poetry-export
  • files were modified by this hook

abdu558 avatar Jun 17 '24 20:06 abdu558

Hi @abdu558 I think this happens because your requirements files are in conflict with the poetry files. i.e.

  • requirements/requirements-dev.txt and
  • requirements/requirements.txt

Since you haven't committed changes in pyproject.toml or in poetry.lock, there is no reason these 2 files should have changes.

So you needs to drop changes in the requirements files or commit the corresponding changes in poetry.lock / pyproject.toml.

tomsail avatar Jun 18 '24 09:06 tomsail

@abdu558 to avoid having to do this every time on github ci, I recommend to set the pre-commit hooks locally on your machine General instruction for pre-commit are there: https://pre-commit.com/

But we've automated things in the Makefile run the make init step and have a look and test the different functions (style, lint, mypy)

Once pre-commit is installed, when you'll commit your changes, it will run all the steps in the .pre-commit-config.yaml

Some functions modify automatically the files (like ruff and black) so you need to add the modifications and try to commit again, which we run again .pre-commit-config.yaml... And so on until you don't have any error tracebacks

tomsail avatar Jun 18 '24 10:06 tomsail

Hi @abdu558 I think this happens because your requirements files are in conflict with the poetry files. i.e.

* `requirements/requirements-dev.txt` and

* `requirements/requirements.txt`

Since you haven't committed changes in pyproject.toml or in poetry.lock, there is no reason these 2 files should have changes.

So you needs to drop changes in the requirements files or commit the corresponding changes in poetry.lock / pyproject.toml.

I was trying to add the new pyproject/poetry files but it kept getting rejected from precommit

abdu558 avatar Jun 18 '24 12:06 abdu558

@abdu558 to avoid having to do this every time on github ci, I recommend to set the pre-commit hooks locally on your machine General instruction for pre-commit are there: https://pre-commit.com/

But we've automated things in the Makefile run the make init step and have a look and test the different functions (style, lint, mypy)

Once pre-commit is installed, when you'll commit your changes, it will run all the steps in the .pre-commit-config.yaml

Some functions modify automatically the files (like ruff and black) so you need to add the modifications and try to commit again, which we run again .pre-commit-config.yaml... And so on until you don't have any error tracebacks

Thank you, I did set it up but it would just reject my commits everytime for the: poetry export main.......................................................Failed

  • hook id: poetry-export
  • files were modified by this hook

I'm not really sure how to proceed with this? this poetry export main failed thing is still happening after i added the dependency to poetry. I did think maybe updating the poetry file completely with poetry update but that results in IOC errors/incompatibility so im not really sure how to proceed with this.

abdu558 avatar Jun 18 '24 12:06 abdu558

@abdu558 I was able to update the requirement files and commit on my local machine after getting your PR commits. We can resolve this if we can have a quick short meeting. Or tomorrow during the group meeting.

SorooshMani-NOAA avatar Jun 18 '24 13:06 SorooshMani-NOAA

Yea tomorrow would be good

abdu558 avatar Jun 18 '24 15:06 abdu558

@abdu558 the resolved start and end times are causing problem. It seems that mypy is blindly looking at the original definition of input arg types so it says it cannot be indexed. Either use a different name for the resolved times or just resolve them where you're indexing.

-    start_date = _resolve_start_date(now, start_date)
-    end_date = _resolve_end_date(now, end_date)

     # Prepare arguments for each function call
     func_kwargs = [
         {
             "station_id": station_id,
             "mode": mode,
-            "start_time": start_date[0],
-            "end_time": end_date[0],
+            "start_time": _resolve_start_date(now, start_date)[0],
+            "end_time": _resolve_end_date(now, end_date)[0],
             "columns": columns,
         }

For the result kwarg dictionary I'm not sure what to do ... we seem to be doing similar in case of IOC:

    for result in ioc_responses:
        station_id = result.kwargs["station_id"]  # type: ignore[index]

@pmav99 what do you think?

SorooshMani-NOAA avatar Jun 20 '24 15:06 SorooshMani-NOAA

@pmav99 we rebased everything and notebooks are also updated. All the tests pass now. Can we go ahead with the merge?

SorooshMani-NOAA avatar Jun 27 '24 12:06 SorooshMani-NOAA

Can we go ahead with the merge?

I don't think it is ready yet. For starters, I would expect that the API would follow the conventions we used on #125 but that doesn't seem to be the case.

pmav99 avatar Jun 30 '24 14:06 pmav99

@pmav99 you're right, sorry I missed it!!

@abdu558 please review the IOC API:

  • List station API: https://github.com/oceanmodeling/searvey/blob/0209bcdf7d78a3b066aa0eff2c9e09de7f61ddea/searvey/ioc.py#L188-L194
  • Internal multistation data API: https://github.com/oceanmodeling/searvey/blob/0209bcdf7d78a3b066aa0eff2c9e09de7f61ddea/searvey/_ioc_api.py#L184-L194
  • Single station data API: https://github.com/oceanmodeling/searvey/blob/0209bcdf7d78a3b066aa0eff2c9e09de7f61ddea/searvey/_ioc_api.py#L222-L232

and make sure the NDBC API follows the same naming convention as well as similar arguments (in cases the make sense). The current corresponding functions you implemented are:

  • get_ndbc_stations: Looks OK.
  • _fetch_ndbc_station_data
    • Rename to _fetch_ndbc
    • Need to accept multistations and multi start/end date
  • fetch_ndbc_stations_data
    • Rename to fetch_ndbc_station
    • Needs to accept single station and single start and end date

I think the rest of the arguments are not as important in this case, but it would be ideal to add them as well. Since columns is specific to NDBC, (like COOPS product, datum, etc.) It's OK to remain there.

@pmav99 do you see anything else that you'd like to be changed before going for the merge?

SorooshMani-NOAA avatar Jul 01 '24 13:07 SorooshMani-NOAA

Alright sounds good

abdu558 avatar Jul 02 '24 05:07 abdu558

_fetch_ndbc_station_data is supposed to accept only one station as its the method just sends the request for the data and gets the data only its not supposed to be used by any user and is only used by the fetch ndbc function.

the fetch ndbc function does accept both single and multiple stations it just requires it to be a list format. I did it this way as I thought it would be easier for the user to use just one method for both single/multiple stations.

abdu558 avatar Jul 02 '24 07:07 abdu558

@abdu558 I agree on principle that underscored methods are no supposed to be used by user. But at some point the decision was made to follow this pattern in IOC and COOPS. @pmav99 do you remember why we made that choice? What do you think?

All in all, I think it's better to stick with the same convention we used for IOC and COOPS for now, later we can resolve the underscore issue.

SorooshMani-NOAA avatar Jul 02 '24 11:07 SorooshMani-NOAA

Hi, I have tried to test on my side but I get a ResponseException: NDBC API: Failed to handle returned data. from ndbc-api. All test fail but also the first query in the notebook. Going further in detail it seems the url to get the stations is down:

https://www.ndbc.noaa.gov/wstat.shtml

@abdu558 Have you experienced that already ?

tomsail avatar Jul 03 '24 10:07 tomsail

Yea Im having similar issues right now even in the ndbc websites there seems errors

abdu558 avatar Jul 03 '24 11:07 abdu558

Codecov Report

Attention: Patch coverage is 96.66667% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.94%. Comparing base (1495264) to head (42d2f46). Report is 22 commits behind head on master.

Files Patch % Lines
searvey/_ndbc_api.py 96.00% 1 Missing and 1 partial :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #146      +/-   ##
==========================================
+ Coverage   89.89%   89.94%   +0.04%     
==========================================
  Files          16       17       +1     
  Lines        1366     1432      +66     
  Branches      190      173      -17     
==========================================
+ Hits         1228     1288      +60     
- Misses         78       83       +5     
- Partials       60       61       +1     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jul 03 '24 14:07 codecov[bot]

@abdu558 looking at some of the changes in the PR, I guess you need to rebase your branch to be based on oceanmodeling:master. Let's worry about it when you get all green on CI tests. If you can get all green by next Tuesday that'd be great!

SorooshMani-NOAA avatar Jul 24 '24 14:07 SorooshMani-NOAA

These changes were just for: Supporting multi start/end dates as requested for the internal method Improving the structure of the NDBC code to follow standards Applying pre commit changes and fixing a few tests impacted by the changes mentioned above Cleaning up notebooks even more and reflecting the changes above in some of the examples

abdu558 avatar Jul 29 '24 00:07 abdu558

For NDBC API package (upstream) it seems there's a mode and column, different modes can have different columns. Right now we support single model queries with multiple columns. @pmav99 @tomsail do you have suggestion for multi-mode mutli-column data fetching?

SorooshMani-NOAA avatar Jul 30 '24 20:07 SorooshMani-NOAA

something like this maybe?

def get_ndbc_stations(
    region: MultiPolygon | Polygon | None = None,
    lon_min: float | None = None,
    lon_max: float | None = None,
    lat_min: float | None = None,
    lat_max: float | None = None,
    ndbc_api_client: NdbcApi | None = None,    # <- Add this
):
    if ndbc_api_client is None:
        ndbc_api_client = NdbcApi()

pmav99 avatar Jul 31 '24 14:07 pmav99

Note to self: #161 #162 #163

SorooshMani-NOAA avatar Sep 12 '24 21:09 SorooshMani-NOAA

Considering your last observations, especially #161, that would benefit EMC's work (ping @AliS-Noaa), I think we still need to implement the part on Historical data, and if we can, use parts of the Realtime implementation.

Also, I think we could leverage these 2 functions: available_realtime and available_historical that only return stations with available data.

tomsail avatar Sep 13 '24 07:09 tomsail

I think it'd be better to review all the current GSoC PRs/branches and merge first, then start thinking about adding the missing pieces (unless the missing pieces make the PR useless). While I'd like everything to be complete before merging, realistically I don't think we have the capacity to do so, so we have to do it iteratively.

I'm still reviewing the missing pieces of NDBC and creating issues!

SorooshMani-NOAA avatar Sep 13 '24 13:09 SorooshMani-NOAA