bioRad
bioRad copied to clipboard
Create `download_vpts_aloft()` function
download_vpts_aloft(
date_min = NULL,
date_max = NULL,
radars = NULL,
directory = ".",
overwrite = FALSE,
format = "csv" # also hdf5
source = "baltrad", # also ecog-04003
)
- [x] Decide on controlled values for
format
. Eithercsv
/hdf5
(widely applicable) ormonthly
,daily
,hdf5
- [x] Should we make use of an S3 dependency to download files?
- [x] How should we warn for unrecognized radars?
- [x] How should we warn for dates without data?
Todo
- [ ] Update bucket endpoint from
https://lw-enram.s3-eu-west-1.amazonaws.com
tos3://aloft
- [ ] Adapt to new directory structure at bucket
- [x] Add new parameter
source = BALTRAD
- [x]
radars
parameter will remain the same (5 letter code:bejab
) - [ ]
directory
parameter can remain the same - [ ]
overwrite
parameter can remain the same - [ ] Update documentation
- [ ] Update tests
- [ ] Deprecate
download_vpfiles()
todownload_vpts_aloft(format = "hdf5")
I'd like to work on this, here are my thoughts:
Q/A
Decide on controlled values for format. Either csv/hdf5 (widely applicable) or monthly, daily, hdf5
I prefer csv/hdf5, and return in the same directory structure as the bucket
Should we make use of an S3 dependency to download files?
Let's try to do it without the dependency at first and reevaluate.
How should we warn for unrecognized radars?
radars: {missing_radars} not found, found {number_of_radars} other radars
How should we warn for dates without data?
No radars found for all dates, radars found for {first_date_found} to {last_date_found}
Tests
- Can we use the existing tests from
download_vpfiles()
? - Testing the directory structure
- Should we test the output files themselves using snapshots?
- test the new warnings
- test overwriting
- test format
Branch
Has work already started on this? Is there a branch I can continue on?
If a radar is missing, stop.
- A json file exists with a list of radars: https://github.com/enram/aloftdata.eu/blob/main/_data/OPERA_RADARS_DB.json:
jsonlite::fromJSON("https://raw.githubusercontent.com/enram/aloftdata.eu/main/_data/OPERA_RADARS_DB.json")$odimcode
- Allow downloading multiple radars
-
download_vpfiles()
as an example - Error if radar doesn't exist based on json file
- Try writing function so it'll download whatever it can
- Use progress to show how many files have already been downloaded. Silence with
progress=FALSE
. - Message for each file downloaded (cf.
download_vpfiles
). Silence withverbose=FALSE
.
We have decided to build a function list_vpts_aloft()
that returns a vector of urls that are known to exist, given the filtering parameters originally envisioned for download_vpts_alof()
list_vpts_aloft(
date_min = NULL,
date_max = NULL,
radars = NULL,
# directory = ".", This parameter is removed
# overwrite = FALSE, This parameter is removed
format = "csv" # also hdf5
source = "baltrad", # also ecog-04003
)
Checking if a file exists can be done using the aws.s3
dependency via: aws.s3::get_bucket_df(bucket = "s3://aloft", prefix="baltrad/monthly", region = "eu-west-1", max = 2000)
or much slower using httr: urls[!furrr::future_map_lgl(urls, ~httr::http_error(httr::HEAD(.x)))]
Note: I think it might be better to create a generic download_files()
function that is provided a vector of URLs (e.g. generated by list_vpts_aloft()
), see #648