bioRad icon indicating copy to clipboard operation
bioRad copied to clipboard

Create `download_vpts_aloft()` function

Open peterdesmet opened this issue 1 year ago • 5 comments

download_vpts_aloft(
  date_min = NULL,
  date_max = NULL,
  radars = NULL,
  directory = ".",
  overwrite = FALSE,
  format = "csv" # also hdf5
  source = "baltrad", # also ecog-04003
)
  • [x] Decide on controlled values for format. Either csv/hdf5 (widely applicable) or monthly, daily, hdf5
  • [x] Should we make use of an S3 dependency to download files?
  • [x] How should we warn for unrecognized radars?
  • [x] How should we warn for dates without data?

Todo

  • [ ] Update bucket endpoint from https://lw-enram.s3-eu-west-1.amazonaws.com to s3://aloft
  • [ ] Adapt to new directory structure at bucket
  • [x] Add new parameter source = BALTRAD
  • [x] radars parameter will remain the same (5 letter code: bejab)
  • [ ] directory parameter can remain the same
  • [ ] overwrite parameter can remain the same
  • [ ] Update documentation
  • [ ] Update tests
  • [ ] Deprecate download_vpfiles() to download_vpts_aloft(format = "hdf5")

peterdesmet avatar Apr 04 '23 15:04 peterdesmet

I'd like to work on this, here are my thoughts:

Q/A

Decide on controlled values for format. Either csv/hdf5 (widely applicable) or monthly, daily, hdf5

I prefer csv/hdf5, and return in the same directory structure as the bucket

Should we make use of an S3 dependency to download files?

Let's try to do it without the dependency at first and reevaluate.

How should we warn for unrecognized radars?

radars: {missing_radars} not found, found {number_of_radars} other radars

How should we warn for dates without data?

No radars found for all dates, radars found for {first_date_found} to {last_date_found}

Tests

  • Can we use the existing tests from download_vpfiles()?
  • Testing the directory structure
  • Should we test the output files themselves using snapshots?
  • test the new warnings
  • test overwriting
  • test format

Branch

Has work already started on this? Is there a branch I can continue on?

PietrH avatar May 17 '23 07:05 PietrH

If a radar is missing, stop.

PietrH avatar May 17 '23 08:05 PietrH

  • A json file exists with a list of radars: https://github.com/enram/aloftdata.eu/blob/main/_data/OPERA_RADARS_DB.json:
jsonlite::fromJSON("https://raw.githubusercontent.com/enram/aloftdata.eu/main/_data/OPERA_RADARS_DB.json")$odimcode
  • Allow downloading multiple radars
  • download_vpfiles() as an example
  • Error if radar doesn't exist based on json file
  • Try writing function so it'll download whatever it can
  • Use progress to show how many files have already been downloaded. Silence with progress=FALSE.
  • Message for each file downloaded (cf. download_vpfiles). Silence with verbose=FALSE.

PietrH avatar May 17 '23 08:05 PietrH

We have decided to build a function list_vpts_aloft() that returns a vector of urls that are known to exist, given the filtering parameters originally envisioned for download_vpts_alof()

list_vpts_aloft(
  date_min = NULL,
  date_max = NULL,
  radars = NULL,
  # directory = ".", This parameter is removed
  # overwrite = FALSE, This parameter is removed
  format = "csv" # also hdf5
  source = "baltrad", # also ecog-04003
)

Checking if a file exists can be done using the aws.s3 dependency via: aws.s3::get_bucket_df(bucket = "s3://aloft", prefix="baltrad/monthly", region = "eu-west-1", max = 2000) or much slower using httr: urls[!furrr::future_map_lgl(urls, ~httr::http_error(httr::HEAD(.x)))]

PietrH avatar May 17 '23 10:05 PietrH

Note: I think it might be better to create a generic download_files() function that is provided a vector of URLs (e.g. generated by list_vpts_aloft()), see #648

peterdesmet avatar Dec 14 '23 10:12 peterdesmet