VirtualiZarr icon indicating copy to clipboard operation
VirtualiZarr copied to clipboard

Better document supported formats (i.e. available parsers)

Open TomNicholas opened this issue 7 months ago • 4 comments

We could do a much better job of documenting which formats we support virtualizing.

We should have another page of docs titled "Supported Formats", that:

  • lists all the formats we support,
  • lists all the parsers we supply (which doesn't necessarily have a simple 1-to-1 relationship to parsers, e.g. the HDFParser is used for netCDF4) and links to their API docs,
  • has some narrative documentation explaining any subleties people should know about (e.g. that the ZarrParser currently only understands native v3 format, not v2),
  • explains that 3rd parties can write their own custom parsers and links to that docs page,
  • has a section for linking to any known 3rd party parsers.

This should only be done after #601 is merged to avoid having to re-do it.

TomNicholas avatar Jun 16 '25 15:06 TomNicholas

That all seems great! Adding on, what do you think about some sort of helper function that lists the available parsers?

from virtualizarr.parsers import available_parsers # or w/e
available_parsers()
['DMRPPParser',
 'FITSParser',
 'HDFParser',
 'NetCDF3Parser',
 'KerchunkJSONParser',
 'KerchunkParquetParser',
 'ZarrParser']

norlandrhagen avatar Jun 16 '25 20:06 norlandrhagen

what do you think about some sort of helper function that lists the available parsers?

It's a nice idea, but I'm not sure it's necessary. In VirtualiZarr's case it would effectively be identical to what's defined in virtualizarr.parsers. So you could just do dir(virtualizarr.parsers). (Note for that suggestion to work properly we would need to make the naming change I suggested in https://github.com/zarr-developers/VirtualiZarr/pull/601#discussion_r2150370701, and move the Parser class to a different namespace like I just suggested in #616.)

The Xarray backends.list_engines() function (which you might have been thinking of) is different and more useful because that actually interrogates Xarray's entrypoint system to see what's available, which depends on what's installed. In virtualizarr's case what's available is a fixed list of classes independent of what dependencies are installed. We could make the available parsers a function of what's installed I guess... That seems too magic for little gain though.

But I could be convinced otherwise!

TomNicholas avatar Jun 17 '25 07:06 TomNicholas

Note that the links in the readme should also be updated to point to this new page.

TomNicholas avatar Jun 17 '25 07:06 TomNicholas

Also note that this can replace the FAQ question we currently have on this topic.

TomNicholas avatar Jun 17 '25 08:06 TomNicholas