netcdf-java
netcdf-java copied to clipboard
[5.x] and up: Open dap4 or other alternative protocol address via NetcdfDataset(s)
Versions impacted by the bug
v5.x, v6.x, v7.x
What went wrong?
A developer at OpenDAP queried about using Panoply for opening a remote dataset that uses a non http/https/ftp protocol, and more specifically the dap4 protocol, e.g. dap4://test.opendap.org/opendap/some/path/to/fnoc1.nc
The NJ methods in NetcdfDataset and NetcdfDatasets that Panoply uses for acquiring a dataset expect a DatasetUrl rather than a URL, and constructing a DatasetUrl accepts such alternative protocol (*).
But if a DatasetUrl using a dap4 protocol is passed to one of the acquireDataset methods, the result is "Unknown service type".
So is there some alternative way to acquire (enhanced) a dap4 protocol DatasetUrl? Or is this an accidental or planned omission?
(*) Actually, it appears a DatasetUrl can be constructed for just about any supposed protocol, including "foo". However the start of that class defines arrays FRAGPROTOCOLS and FRAGPROTOSVCTYPE with what I presume are acceptable alternatives.
Relevant stack trace
No response
Relevant log messages
No response
If you have an example file that you can share, please attach it to this issue.
If so, may we include it in our test datasets to help ensure the bug does not return once fixed? Note: the test datasets are publicly accessible without restriction.
N/A
Code of Conduct
- [X] I agree to follow the UCAR/Unidata Code of Conduct
When you get the "Unknown service type" error message, does it include the name of the service or just an empty string? (This helps diagnose whether the failure is coming from assigning the service or from finding the correct class to handle it)
@haileyajohnson, I just belatedly noticed you had commented on this. The error message is actually "Unknown service type: DAP4".
In both NetcdfDataset and NetcdfDatasets, when durl.getServicetype() is called, the only two allowed cases i the following switch block are File and HTTPServer.
BTW: This issue tangentially came up today because developers at GSFC and JPL were having trouble using Panoply to access data on an OpenDAP server because of DAP2 vs DAP4 protocol confusion. Panoply passes an https:// address to NJ to acquire the remote dataset, and it seems that NJ is trying to use the DAP2 protocol but the server wants DAP4.
So that means it's getting the correct service type, but not finding the right NetcdfFileProvider, in this case, DapNetcdfFileProvider. Is it possible the build isn't bringing in the dap4 module?
In both NetcdfDataset and NetcdfDatasets, when durl.getServicetype() is called, the only two allowed cases i the following switch block are File and HTTPServer.
DAP4 actually shouldn't be reaching that block at all, it should be returning a provider from the first for loop in that method.
@haileyajohnson, So apparently the dap4 package is not included when I build netcdfAll?
I see it is when I build toolsUI and that toolsUI will try to open a remote dataset whose URL begins with the dap4 protocol. However, both cases I just tested failed. One with some of Container parsing error, and in the other, the toolsUI simply locked up.
@rschmunk I asked around and got some background on this issue - apparently the dap4 package is intentionally left out of netcdfAll because it has some major bugs in v5+ (including what you're seeing in toolsui). To be honest, it doesn't look like it will be a quick fix, but it is back on our radar now.
@haileyajohnson, I am informed that NASA's NGAP project is pushing to switch over to DAP4, so sooner is better than later for anyone using NJ to access one of the associated dataset repos.
What is the approved way of opening a remote DAP4 file or catalog?
It appears that as of a recent snapshot commit, the DAP4 code had been updated and is now included by default in the netcdfAll build.
However, I don't seem to be able to use it.
I have been trying to use NetcdfFiles.open(String) to open an example remote dataset on a DAP4 server maintained by the OpenDAP developers. If that String begins with "dap4://", thenI just get back a "No such file or directory" exception. See stacktrace below using an NJ snapshot from May 30.
I can access that file if I replace "dap4://" with "http://", but then I just get errors when trying to variables within.
Maybe there's a problem with the server config or the example file address I have been given (I'll query someone at OpenDAP about that shortly), but I'd like to be sure that I am at least starting with the correct scheme for attempting to open the file.
-- Sample stacktrace using "dap4://" at start of remote file address:
java.io.FileNotFoundException: dap4:/test.opendap.org:8080/opendap/dmrpp_test_files/ATL03_20181228015957_13810110_003_01.1var.h5.dmrpp (No such file or directory) at java.base/java.io.RandomAccessFile.open0(Native Method) at java.base/java.io.RandomAccessFile.open(RandomAccessFile.java:346) at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:260) at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:214) at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:128) at ucar.unidata.io.RandomAccessFile.<init>(RandomAccessFile.java:331) at ucar.unidata.io.RandomAccessFile.acquire(RandomAccessFile.java:192) at ucar.nc2.NetcdfFiles.getRaf(NetcdfFiles.java:465) at ucar.nc2.NetcdfFiles.open(NetcdfFiles.java:274) at ucar.nc2.NetcdfFiles.open(NetcdfFiles.java:243) at ucar.nc2.NetcdfFiles.open(NetcdfFiles.java:216) at gov.nasa.giss.data.nc.NcDataset.init(NcDataset.java:458)
try this URL:
http:/test.opendap.org:8080/opendap/dmrpp_test_files/ATL03_20181228015957_13810110_003_01.1var.h5.dmrpp#dap4
Hah! That's actually one of the addresses I have been trying to test accessing. It's the "#dap4" appended to the URL that makes a difference.
Okay, just loaded the catalog http://test.opendap.org/opendap/. Navigating down the tree to some random HD5 file, the http:// URL that is reported for the file is no good. It does work if I copy the address and append "#dap4" to it.
That's... awkward.
I will investigate why "dap4:" does not work. The problem is that the URL must somehow inform your client program what protocol to use to access the data: DAP4 in this case. Two hints available are the "dap4:" protocol or appending "#dap4" to the URL. Not sure why "dap4:" is not working. I thought I was testing for that. One other thing you need to be aware of has to do with accessing Hyrax servers. There is and ongoing issue about how to handle checksums WRT the DAP4 specification. See https://github.com/OPENDAP/dap4-specification/issues/1 and https://github.com/OPENDAP/dap4-specification/discussions/6. I have a temporary fix as follows. Append to your URL, the following string: "#hyrax". So, for example you should specify something like this:
dap4:....#hyrax" -- assuming the dap: protocol was being recognized. or http:....#dap4&hyrax
One other point. The client program can interrogate the server and determine the proper protocol by looking at the response. But this functionality is not yet published for thredds, and I do not think it works for Hyrax.
@DennisHeimbigner, Did you ever look at this any further? I see no related commits, so maybe not?
I recently heard from a couple people at NASA/JPL placing data on an agency DAAC who wanted to know more about Panoply's (and hence netCDF-Java) ability to access data on a DAP4 archive. As an example, they cited a dataset described at an opendap.earthdata.nasa.gov address. The actual data URL as given there is the same minus the .dmr.html extensions.
In testing again a moment ago, I'm using an NJ 5.5.4 snapshot from a week ago.
That address, as before, gets back a 405 server response when Panoply feeds it as is to the NJ library.
If I change the address so that it starts with dap4: rather than https:, the response is a FileNotFound, as follows, when my code attempts NetcdfFiles.open ( dap4addressStr );
java.io.FileNotFoundException: dap4:/opendap.earthdata.nasa.gov/collections/C2706510710-POCLOUD/granules/measures_esdr_as_metopb_l2_wind_stress_48195_v1.1_s20220101-000357-e20220101-014518_ancillary (No such file or directory)
at java.base/java.io.RandomAccessFile.open0(Native Method)
at java.base/java.io.RandomAccessFile.open(RandomAccessFile.java:344)
at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:259)
at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:213)
at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:127)
at ucar.unidata.io.RandomAccessFile.<init>(RandomAccessFile.java:331)
at ucar.unidata.io.RandomAccessFile.acquire(RandomAccessFile.java:192)
at ucar.nc2.NetcdfFiles.getRaf(NetcdfFiles.java:465)
at ucar.nc2.NetcdfFiles.open(NetcdfFiles.java:274)
at ucar.nc2.NetcdfFiles.open(NetcdfFiles.java:243)
at ucar.nc2.NetcdfFiles.open(NetcdfFiles.java:216)
at mycode...
So using the https address but appending #dap4 as you previously suggested, the remote dataset is apparently successfully loaded. However, if I try extract a variable and make a plot... a Dap4Exception results due to a "Malformed chunk source".
dap4.core.util.DapException: dap4.core.util.DapExc
eption: dap4.core.util.DapException: Malformed chunked source
at dap4.dap4lib.HttpDSP.loadDAP(HttpDSP.java:113)
at dap4.dap4lib.cdm.nc2.DapNetcdfFile.ensuredata(DapNetcdfFile.java:351)
at dap4.dap4lib.cdm.nc2.DapNetcdfFile.readData(DapNetcdfFile.java:277)
at ucar.nc2.Variable.reallyRead(Variable.java:797)
at ucar.nc2.Variable._read(Variable.java:736)
at ucar.nc2.Variable.read(Variable.java:614)
at ucar.nc2.dataset.VariableDS.reallyRead(VariableDS.java:471)
at ucar.nc2.dataset.VariableDS._read(VariableDS.java:444)
at ucar.nc2.dataset.VariableDS._read(VariableDS.java:454)
at ucar.nc2.Variable.read(Variable.java:600)
at ucar.nc2.Variable.read(Variable.java:546)
...
Caused by: dap4.core.util.DapException: dap4.core.util.DapException: Malformed chunked source
at dap4.dap4lib.D4DSP.loadDAP(D4DSP.java:200)
at dap4.dap4lib.HttpDSP.loadDAP(HttpDSP.java:111)
... 29 more
Caused by: dap4.core.util.DapException: Malformed chunked source
at dap4.dap4lib.DeChunkedInputStream.readChunk(DeChunkedInputStream.java:228)
at dap4.dap4lib.DeChunkedInputStream.read(DeChunkedInputStream.java:146)
at dap4.dap4lib.DeChunkedInputStream.read(DeChunkedInputStream.java:135)
at dap4.dap4lib.D4DataCompiler.compileAtomicVar(D4DataCompiler.java:169)
at dap4.dap4lib.D4DataCompiler.compileVar(D4DataCompiler.java:131)
at dap4.dap4lib.D4DataCompiler.compile(D4DataCompiler.java:108)
at dap4.dap4lib.D4DSP.loadDAP(D4DSP.java:195)
... 30 more
The DAAC is apparently not a Hyrax server, which you indicated might be another potential problem to cope with.
In your last comment, you mentioned "client program can interrogate the server and determine the proper protocol by looking at the response". How? Is that a matter of decrypting an error message, or is there an actual method one can call to get that info?
I should probably ask, do my troubles above interconnect with #1232?
/thx
Sorry, this may have got lost in my stack. Let me do some checking.
I had the fix, but apparently I got side tracked. Anyway, see PR https://github.com/Unidata/netcdf-java/pull/1255
@DennisHeimbigner, After changing one line of my code in Panoply that calls NJ and using the NJ 5.5.4 snapshot with yesterday's commits, I successfully acquired a remote DAP4 file using the dap4://... prefix.
However, I am still getting Malformed Chunk exceptions trying to read the actual data from the JPL earthdata.nasa.gov address I mentioned above. There are also a couple other sample datasets served via the earthdata.nasa.gov proxy that when trying to get the data, the process just never comes back with an answer. One of these samples is a DAP4 trajectory dataset, and the other is a non-DAP4 file.
Also, please doublecheck lines 132 and 133 of the updated DapNetcdfFile. The code is setting the xuri scheme to "https", but the comments suggest it's supposed to be set to "http" because test.opendap.org doesn't speak https. Is there an error there, or am I just reading the comments wrong?
OOPS!
I was testing if the test.opendap.org test server was no accepting https requests (it is still not doing so).
I apparently forgot to switch back to http. I will put up a pr for it shortly.
As to the malformed chunks, my best guess is that this is the checksum problem I described above.
If you accessing a Hyrax-based server, then this is not an easy fix.
If you are using the lastest master that includes PR https://github.com/Unidata/netcdf-java/pull/1211,
then this may work; append "?checksumdap4.checksum=true" to the end of your url.
It probably won't because of this issue: https://github.com/OPENDAP/dap4-specification/discussions/6.
I have no current fix for this problem yet.