argopy icon indicating copy to clipboard operation
argopy copied to clipboard

Upstream requirements management and policy

Open gmaze opened this issue 2 years ago • 4 comments

Since the beginning of argopy, we've kind of struggle to manage how the library interacts with its dependencies

We need a proper mechanism, and policy ?, to handle support for a range of dependency versions

New versions for xarray and fsspec are released every month. We've set-up CI tests with conda env. using latest versions so that breaking changes should be detected using deprecation warnings. Would it be possible to automatically in GA collect and create an issue with future deprecation warnings, so that we can't miss them ?

But at this point, we don't know when we'll lost support for old versions. I've created an env file with minimal versions but CI tests are still missing.

Also, we're using a few modules just for 1 or 2 functions. For instance:

  • erddapy is used as a URL formatter, not for requests
  • scikit-learn is used only for preprocessing.LabelEncoder()
  • packaging is used only to parse and compare other dependencies versions (and keep argopy working)
  • and I think that dask is never imported !

I see some contradictions here between having as fewer as possible dependencies and yet simply use what's best and available out there in the community (and should be acknowledged for it)

gmaze avatar Mar 21 '22 11:03 gmaze

From what I understand, Xarray seems to use the following workflow to check for compatibility with all upstream latest versions:

  • Run daily a GA with standard CI tests and environment based on python 3.10
  • if CI tests fail:
    • create & upload an artifact with all logs
    • report failure on a Github issue (create one or update one if found)

Here is an example of a failed "CI upstream" workflow run: https://github.com/pydata/xarray/actions/runs/2019469403

It created the following issue: https://github.com/pydata/xarray/issues/6398

The GA workflow uses:

  • a log parser that extract a short test summary to fill in the issue body
  • GraphQL to manage the issue posting

gmaze avatar Mar 22 '22 13:03 gmaze

erddapy is used as a URL formatter, not for requests

There are two options here. One is vendoring that part of the code and, while I don't like that, it is a stable solution. I do plan on a major refactor during GSoC that will make all the URL creation just simple functions for downstream projects to use. It will make your life a bit simpler (I hope), and any big fixes will (again hopefully) always without breaking the API.

I cannot speak for the other projects. I imagine that vendoring fsspec would be a bad idea. Unlike erddapy.

ocefpaf avatar Mar 22 '22 15:03 ocefpaf

I do plan on a major refactor during GSoC

Dealing with erddap protocol is clearly not under argopy radar, so I prefer to preserve this upstream dependency. If one day the erddap protocol evolves, I don't want to manage this from argopy. If you make the URL creation more seamingless, it's great !

I imagine that vendoring fsspec would be a bad idea

Indeed a very bad idea 😨 ! Dealing with fsspec is a daily challenge for me since I still don't get the subtilities of the library. I'm afraid since has led me to add an ever increasing thick layer on top of it, dealing with the cache is a mystery ! Anyway, I'm still convinced it's a great library that will be very beneficial to argopy over the long run, anticipating for more cloud based Argo data resources in the future

gmaze avatar Mar 23 '22 10:03 gmaze

This issue was marked as staled automatically because it has not seen any activity in 90 days

github-actions[bot] avatar Jun 21 '22 10:06 github-actions[bot]

This issue was marked as staled automatically because it has not seen any activity in 90 days

github-actions[bot] avatar Dec 22 '22 10:12 github-actions[bot]

I'm closing this issue on the following decisions:

  • Dev and unit testing are made on pinned environments (with this GA workflow and ci/py*-<all/core>-pinned.yml env files)
  • Any commit can trigger tests with upstream librairies in free versions with the [test-upstream] tag on commit message (with this GA workflow)
  • Upstream tests are ran every night, failing will be noticed by dev. team and new issues addressed whenever time allows
  • On every new release, pinned environments are updated with the current "free" versions
  • Oldest versions (former min env. files) are no longer tested or monitored

gmaze avatar May 02 '23 11:05 gmaze