netcdf4-python
netcdf4-python copied to clipboard
URL of an OpenDAP dataset - Temporary file
If I pass an URL of an OpenDAP dataset to Dataset
, how can I recover the path to the temporary file?
There is no temporary file - the data is received directly over http.
@jswhit Do you have thoughts on a caching solution?
For netcdf-c there is a flag you can add to an opendap url to cause the retrieved data to be stored and accessed from disk;this is generally most useful for larget return datasets. When you ask for caching, what is the use-case. there are a couple of use-cases of caching.
- read the whole dataset into the client and then do the constraints on that cache to return subsets.
- cache into the file system and reuse it across programs. This is tricky.
My use-case is of the first-kind: a Python-based application needs to retrieve data from a model that changes once a day. I was wondering whether I could take advantage of some function in the library to avoid to download the data each time that the application is executed on the same day. Or I do have to hand-craft something.
@DennisHeimbigner, where are the docs for the caching flag?
In case anyone else arrives at this issue: the unidata netcdf documentation here states:
Caching
In an effort to provide better performance for some access patterns, client-side caching of data is available. The default is no caching, but it may be enabled by prefixing the URL with the parameter "cache".
Caching operates basically as follows.
When a URL is first accessed using nc_open()
, netCDF automatically does a pre-fetch of selected variables. These include all variables smaller than a specified (and user definable) size. This allows, for example, quick access to coordinate variables. This can be suppressed with the parameter "noprefetch".
Whenever a request is made using some variant of the nc_get_var()
API procedures, the complete variable is fetched and stored in the cache as a new cache entry. Subsequence requests for any part of that variable will access the cache entry to obtain the data.
The cache may become too full, either because there are too many entries or because it is taking up too much disk space. In this case cache entries are purged until the cache size limits are reached. The cache purge algorithm is LRU (least recently used) so that variables that are repeatedly referenced will tend to stay in the cache.
The cache is completely purged when nc_close(
) is invoked.
In order to decide if you should enable caching, you will need to have some understanding of the access patterns of your program.
The ncdump program always dumps one or more whole variables so it turns on caching.
If your program accesses only parts of a number of variables, then caching should probably not be used since fetching whole variables will probably slow down your program for no purpose.
Unfortunately, caching is currently an all or nothing proposition, so for more complex access patterns, the decision to cache or not may not have an obvious answer. Probably a good rule of thumb is to avoid caching initially and later turn it on to see its effect on performance.
re https://github.com/giumas comment above. I infer that you need to keep the cached file across multiple invocations of the python program. The current built-in caching in the netcdf-c opendap code will throw away its internal cache when the program exits. If you need to keep a copy that is accessible across program invocations, then I would suggest using nccopy to create a local netcdf file from the opendap url and then have your programs directly access that local copy.