hdf5 icon indicating copy to clipboard operation
hdf5 copied to clipboard

refactor H5T_cmp and reintroduce Tony Horrobin's cache of the member-name sort order

Open gnuoyd opened this issue 3 years ago • 1 comments

I have extracted a couple of subroutines from H5T_cmp and simplified a lot, mainly by using temporary variables to replace repeated expressions. The code is more readable this way.

There are a lot of changes in this PR. It consists of 33 individual commits with descriptions. I recommend reading the individual commits.

I'm marking this WIP because I have some doubts about (re-)sorting compound-type members "on demand" in H5T_cmp.

It seems that applications probably can afford the time and space cost of the library keeping members sorted by both name and index, always. I don't think applications will change any type's members often enough to make an important performance difference. I don't think applications use types in numbers great enough that space will be a problem. Not so?

If the members are always stored in offset order, and there is always a name-ordered array of member indices alongside, then H5T_cmp does not have to allocate any memory. Right now, when an allocation fails, H5T_cmp returns 0, indicating arbitrarily that its H5T_t arguments are "equal". That sets up the library or application to fail mysteriously somewhere down the road.

Enum types, too, should keep their cases continually sorted, to avoid on-demand heap allocation.

gnuoyd avatar Jan 24 '22 20:01 gnuoyd

https://github.com/HDFGroup/hdf5/pull/1176

hyoklee avatar Nov 04 '22 16:11 hyoklee

Closing due to age and lack of movement. Issue #4528 has been created as a reminder to go back over this.

derobins avatar May 28 '24 15:05 derobins