searvey Unify IOC/COOPS API

Addresses #99

Jan 26 '24 17:01 pmav99

@SorooshMani-NOAA I pushed on the dev branch. Have a look. I added the code on ioc.py but implemented the tests on a different file so that it is easier to run them. Feel free to change anything. Don't worry too much about commit messages, we will rebase before we merge.

@brey & @tomsail if you find the time to checkout the dev branch and try it out, I would appreciate any feedback.

Jan 26 '24 17:01 pmav99

Codecov Report

Attention: Patch coverage is 91.27182% with 35 lines in your changes missing coverage. Please review.

Project coverage is 89.89%. Comparing base (13ba6be) to head (1495264). Report is 12 commits behind head on master.

:exclamation: Current head 1495264 differs from pull request most recent head 536ddbc

Please upload reports for the commit 536ddbc to get more accurate results.

Files	Patch %	Lines
searvey/_coops_api.py	86.75%	13 Missing and 7 partials :warning:
searvey/_ioc_api.py	93.04%	4 Missing and 4 partials :warning:
searvey/utils.py	37.50%	5 Missing :warning:
searvey/coops.py	96.42%	1 Missing and 1 partial :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #125      +/-   ##
==========================================
- Coverage   90.18%   89.89%   -0.29%     
==========================================
  Files          13       16       +3     
  Lines        1049     1366     +317     
  Branches      155      190      +35     
==========================================
+ Hits          946     1228     +282     
- Misses         56       78      +22     
- Partials       47       60      +13

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jan 29 '24 13:01 codecov[bot]

COOPS web API uses redirects, so we need to make sure we always follow redirects for COOPS: https://www.python-httpx.org/compatibility/

Jan 29 '24 21:01 SorooshMani-NOAA

@pmav99 do you have any suggestions for debugging issues in the multithread/process calls by breakpoint?

Jan 31 '24 21:01 SorooshMani-NOAA

From the top of my head, no, not really. Do the functions work when you run them directly?

Jan 31 '24 23:01 pmav99

I wanted to attach the debugger and step through the code when testing different inputs. For now I got away with print when I see issues, but I wanted to know if there's any better way of doing it.

Feb 01 '24 13:02 SorooshMani-NOAA

The problem with using breakpoint() or pdb.set_trace() is that the breakpoint will get executed as many times as you call the function. I doubt this is what you want.

But, TBH, this question feels a bit strange. Just write and test the functions that retrieve and parse the data without any concurrency. As soon as the retrieve_data and the parse_data functions are correct, you can plug in multifutures (or whatever) to add concurrency and make the whole thing faster. But trying to integrate concurrency from the get go is probably overcomplicating things.

That being said, what can help with debugging multiprocessing/multithreading code is using the logging module, which if configured correctly, can print information about the current thread/process. For example:

import logging
import time
import multifutures as mf

logging.basicConfig(
    level=10,
    style="{",
    format="{levelname:8s}; {process:6d}; {threadName:8s}; {asctime:s}; {name:<15s} {lineno:4d}; {message:s}",
    force=True,
)
logger = logging.getLogger(__name__)

def foo(a):
    time.sleep(0.3)
    logger.info("The value of a=%d", a)

logger.info("Start Multiprocessing")
results = mf.multiprocess(foo, [{"a": i} for i in range(5)])
logger.info("Start Multithreading")
results = mf.multithread(foo, [{"a": i} for i in range(5)])
logger.info("Done")

Feb 01 '24 15:02 pmav99

... we should extract them ...

I agree, I tried to stay as close to the updated IOC code as possible, that's why I basically copy pasted everything and then started modifying. Some of these can be easily put in utils to be used by all the datasource APIs, but also I don't want to start abstracting too soon.

Feb 02 '24 13:02 SorooshMani-NOAA

WRT to the names mapping, I haven't tried this but I was wondering whether it is actually simpler to do something like this:

Run the code we have in order to generate the final dictionaries
Hardcode the final dictionaries in the file. Any time a product gets added/removed we add/remove a line from the respective dictionary.

Yes, the code will be longer and repetitive, but we have no code generation, no .update(), no ugly comprehensions, no ** etc. Just declarative code.

Feb 05 '24 10:02 pmav99

@pmav99 it's a good idea, I'll do that!

Feb 05 '24 14:02 SorooshMani-NOAA

@pmav99 I'm trying to decide what to do with defaults for datetimes. The issue is for some products:

I should include dates that include more than one month before current time to be able to fetch, e.g. high_low, hourly_height, and monthly_mean.
Or I should not include only dates more than 1 month prior to current dates, e.g. one_minute_water_level.

To resolve the first issue I can just create coops specific _resolve_start_date and specify the default to be 60 days. For example as of today (February 6th 2024) for monthly_mean I should specify 2023/12/31 to 2024/02/06 so that I get data. If instead I specify 2024/01/01 it fails!

But specifying 60 days cause problem for one_minute_water_level. The max interval for this product is 4 days, but there's no data before less than 1 month from now (at least for the station I'm testing). So if I specify 2024/01/09 to 2024/02/06, it fails because the chunk from 9th to 13th doesn't have data.

I'm not sure what's best. Should I just leave it to the user and say the default might not work for some products? Or should we add more dictionaries for info about what is the default start date range (relative to now) for each product? I'm not even sure if these are true for all stations! Should I first consult with COOPS?

If it's the latter, I think it makes sense to for now go with "default might not work" and later amend.

Actually another problem is whether I have to raise error if there's no data or just warn the user? Coops gives me an error message when there's no data.

Feb 06 '24 16:02 SorooshMani-NOAA

@pmav99 except for the two questions above, I'm done with COOPS API. I know that the metadata API needs some adjustment, but I'm assuming we want to keep that separate from the data API update, right?

Feb 07 '24 18:02 SorooshMani-NOAA

WRT the default dates. I have to admit that I haven't really looked into this, but couldn't we have a mapping with the default durations? If the data are not too big we could even return the maximum interval by default. Am I missing something?

Feb 07 '24 20:02 pmav99

I agree that we might be able to cover most of cases with dictionaries, but I don't want to add too many dicts, like one for how long should I go back before I start having data (e.g. in case of mean, hi/lo, etc.), or how long can I go before there's no data (old data removed (e.g. one minute heights), and so on. I just don't want to go around adding a dict for each of these limitations, especially for these ones that don't have anything in documentation and I just found by trial and error! For now I think I'll just go with let the user figure out ... let's address this later. This PR is already a lot of changes and dictionaries!

Feb 07 '24 21:02 SorooshMani-NOAA

@pmav99 I'm done from my side. Please let me know if there are any issues I need to fix, otherwise feel free to merge whenever you can. I'm not sure why tests are failing, please let me know if I need to do anything about it.

Apr 10 '24 19:04 SorooshMani-NOAA

@SorooshMani-NOAA I rebased on master and pushed. You will see that a bunch of COOPS tests are broken. Please fix them. Once this is done please squash the COOPS commits (rebase and push). Keep as many commits as you see fit. Once this is done we need to review the whole thing to see if it is ready for merging or not

May 15 '24 05:05 pmav99

Thank you @pmav99 the rebase seems to have brought some of the changes for COOPS new API, but not all of it, I'll patch things up to prep for merge.

May 15 '24 14:05 SorooshMani-NOAA

I just noticed you created a _coops_api.py file. Did you have any reorg in mind or is it a results of automated conflict resolution?

May 15 '24 14:05 SorooshMani-NOAA

@pmav99 one question, the new IOC API also has station type info; for a given station ID, there might be different types. Should I get a unique set for stations API since we don't have that column?https://github.com/oceanmodeling/searvey/blob/95f26d6da277dbf54b0c09dac27dcf0ed24fa688/searvey/stations.py#L16-L27

May 15 '24 20:05 SorooshMani-NOAA

@pmav99 all the tests go through, please go ahead with the merge as soon as you can so that we don't need to spend a lot of time rebasing, etc. again later. The precommit failure is weird, locally the notebook is cleaned up and precommit doesn't complain!

May 16 '24 12:05 SorooshMani-NOAA

@pmav99 I blobbed all of the changes for COOPS:

adding new API for COOPS
refactoring API (and the tests you added there)
Adding COOPS tests for the new & refactored API

Note that in this case the tests will fail after first and second commit, since all the coops related tests are updated in the third commit. Since refactoring requires rewriting the tests I cannot combine the 3rd and 1st commits either, it still fails. Unless you prefer I split the 3rd commit and add each part to the 1st and 2nd commit

Please let me know what to do

Jun 05 '24 20:06 SorooshMani-NOAA

@pmav99 please let me know if I need to change anything else

Jun 10 '24 12:06 SorooshMani-NOAA

@SorooshMani-NOAA I disabled the progress bar by default on the new api. let me know what you think about this

Jun 19 '24 10:06 pmav99

@pmav99 looks good to me. Please go ahead and merge if you're happy with everything as well, thank you!

Jun 20 '24 16:06 SorooshMani-NOAA