Revisit dispatch mechanism...
Several things need to be done:
- for nc_open add a dispatch table entry that given a file can tell if it should be processed by that dispatcher. 2.for nc_create we need a similar entry but it must work off of the path name + mode flags since the file does not exist.
- Clean up the substrate mechanism; it is currently too hard to explain. 4.Investigate dynamic library loading:. If this is to be believed: https://en.wikipedia.org/wiki/Dynamic_loading then we would only have two cases:
- linux and OSX: use dlopen
- Windows: use LoadLibrary So it seems feasible.see
I am deep in the dispatch code right now. I think it works well but of course welcome improvement.
With PIO, there will not be any way to tell from the file alone that PIO should be used. The dispatch must always look at the mode flag for open and create for NC_PIO. PIO can open files of any netCDF binary format, so it needs to be explicitly invoked when used. For this reason, I have to check for NC_PIO before checking anything else.
Is this the time to break up nc_inq_all?
Right now I have a bit of a hack to put the szip info into the filter fields in order to support szip. What would be better would be if options_mask and pizels_per_block were once again in the parameters list.
However, what might be best would be to break apart the giant inq_var_all function, and put all the individual inq functions in the dispatch table. @DennisHeimbigner what do you think?
We can still do this now, but the dispatch table, in addition to being used by PIO, is soon to be used by another project, an ocean model. So while we can still make dramatic changes in the dispatch table without affecting anyone, that window is closing. ;-)
While adding additional functions sounds great and I support it, is there a reason to remove the nc_inq_var_all() function? Even if we decided to remove it, we'd end up flagging it as deprecated rather than do a straight-up removal. And even then, I'd be inclined to leave it in unless there were some demonstrable issue.
I have enountered the same problem as Ed. I tend to dislike multiplying the number of idspatch functions. The def_var_xxx being an example of something I would prefer to replace with a single aggregating function that uses an argument to determine what the function does.
The problem with aggregation is handling widely varying argument lists. It is a bad idea to use stdargs (x,y,...) because other languages (e.g. fortran or rust or ...) may have not be able to handle them. One approach I have taken with the multifilters is to "fake" stdargs by using three arguments:
- format - an integer
- nbytes - size_t
- params - void*
The format of the params is determined by format and its size is determined by nbytes. I can then define a struct for each aggregated function and pass a format indicating that this struct is being passed. the size of the struct and a pointer to it are the nbytes and params arguments.
We could rebuild inq_var_all into this form without too much difficulty and it would be extendible.
@WardF note that the inq_var_all() function is not a public function. The public functions are the ones we are familiar with, nc_inq_var_whatever(). But behind the scenes each of them calls the inq_var_all() function, which is what is in the dispatch table. So we can and do change it's prototype without having to worry about user problems.
@DennisHeimbigner I agree that what you propose would work; it would be rather opaque code. That is, we would be passing parameters in a way that is not normal for C. However, since it would be hidden behind all the C inq functions, at least that would not be a problem for Fortran, nor would users ever have to deal with the complexity.
In fact, I realized a better way to solve my szip problems which uses the existing arguments of inq_var_all, so I retract my suggestion that we change it at this time.
@edhartnett thanks for clarifying, I see you're correct. You typed inq_var_all() but I read nc_inq_var_all(), my mistake.
So we can and do change it's prototype without having to worry about user problems.
but with user custom dispatch tables, nc_inq_var_all will be public. THe question I have not personally resolved is: should dispatch have lots of separate entries versus aggregating. FOr example., I would start aggregating all the def_var_XXX function in the dispach table. Ward- do you have an opinion on many functions vs aggregation?
@DennisHeimbigner the dispatch functions are not public in the same way as the main API calls. The average scientist programmer will know nothing about the dispatch table functions, nor do they have to be called from Fortran or other languages, as the main API functions do - they are C only. Nor are they documented to the same standard.
Also, there is no backward compatibility guarantee for the dispatch functions. They may be changed or removed and it's up to dispatch table users to use the version number and adjust to those changes.
I don't have any objections to the def_var functions being aggregated. I believe this will affect only libsrc4/libhdf5, since none of the other dispatch layers have any code to handle most of the def_var_* functions. I'm happy to help with this coding.
Agreed that the dispatch is only semi-public and provides only limited compatibility guarantees. At this point, I think the aggretated vs individual function issue is the most important. We will pretty much be stuck with our decision.
I was thinking about the aggregation approach and an not sure how easty that is to handle in e.g. Fortran.
But since the dispatch functions are not exposed to Fortran, that's not a problem. Fortran always calls the public C API, which calls the dispatch functions. So the dispatch functions don't have to be Fortran-friendly.
Sorry, I wandered off topic. For multifilters and to presage Zarr filters, I am extending the interface API, which will affect FOrtran. Some of the new function currently use the format+nbytes+params idea I described above.
I think this issue can be closed, the dispatch table has endured...