Improvements to the DMR++ parser
The DMR++ parser was merged in #133, but there are a few ways it could be improved.
- Docs. It's not actually listed anywhere publicly that DMR++ files are supported, not even in the docstring of
open_virtual_dataset. - HDF4 support (#216)
- Use
ChunkManifest.from_arrays, which should increase performance and will reduce reliance on the kerchunk in-memory format (https://github.com/zarr-developers/VirtualiZarr/pull/113#discussion_r1723850532) - Internal code improvements, e.g.:
a. Use
pathlibmodule instead ofosinternally b. Refactor to be more functional, see https://github.com/zarr-developers/VirtualiZarr/pull/113#discussion_r1723848303
cc @ayushnag @betolink
I would like to be involved in some of this work. I can definitely work to better understand the complexities of HDF4 and the steps to enable support to HDF4.
@ayushnag is there a way to identify a DMR++ file automatically? e.g. a file magic?
Not to my knowledge. All valid XML files must start with the string "<?xml" however beyond that I think there would need to be some reading of the header tags (e.g. xmlns:dmrpp="http://xml.opendap.org/dap/dmrpp/1.0.0#") to know it is a dmrpp file.
cc @Mikejmnez @jgallagher59701
@ayushnag is right. The first four elements are not be enough to discern between a generic xml from a dmrpp-generated xml.