HDF.PInvoke
HDF.PInvoke copied to clipboard
Emulate C++'s typedefs with structs
At the current state e.g. the functions H5F.create( .. )
and H5F.get_filesize( .. )
both return Int32
even though create
returns an actual id and get_filesize
only indicates if there was an error. So the real meanings are lost in C# when using IntelliSense.
I suggest to wrap the values into structs to create marshable wrapper types to preserve the semantics that are carried by aliases (C++ typedef
s).
So instead of adding a lot using
s on top of every file:
using herr_t = System.Int32;
#if HDF5_VER1_10
using hid_t = System.Int64;
#else
using hid_t = System.Int32;
#endif
I would add these types ONCE into the library:
public struct herr_t
{
public System.Int32 Value;
public static implicit operator System.Int32(herr_t value) => value.Value;
}
public struct hid_t
{
#if HDF5_VER1_10
public System.Int64 Value;
public static implicit operator System.Int64( hid_t value ) => value.Value;
#else
public System.Int32 Value;
public static implicit operator System.Int32(hid_t value) => value.Value;
#endif
}
This allows would lead to much more readable IntelliSense hints and enables the use of ad-hoc polymorphism like:
public static class HdfPInvokeHelpers
{
public static void ThrowOnError(this herr_t value)
{
if (value < 0)
throw new Exception(H5EHelper.GetErrorStack().ToString());
}
public static hid_t ThrowOnError(this hid_t value)
{
if (value < 0)
throw new Exception(H5EHelper.GetErrorStack().ToString());
return value;
}
}
class Program
{
static void Main(string[] args)
{
var file = H5F.create("file.h5", H5F.ACC_RDONLY, H5P.DEFAULT, H5P.DEFAULT).ThrowOnError();
ulong size;
H5F.get_filesize(file, ref size).ThrowOnError();
}
}
Update
Missing cross-define for hid_t
added.
I started a branch with the suggested changes. Though it's not done yet.
@hokb I noticed, that both size_t
and ssize_t
are mapped to System.IntPtr
in every file. Is this on purpose? As far as I know the first s in ssize_t
means "signed" since size_t
is unsinged.
So a direct mapping of C++ would be this:
using size_t = System.UIntPtr;
using ssize_t = System.IntPtr;
I believe the main motivation was that UIntPtr
is not CLS-compliant. Do we care?
Thanks @JanWosnitza. Great effort and an interesting attempt to ease the handling of our id types. Just some points to consider:
- Wrapping the id / err values into a struct involves additional (explicit and implicit) calls for storing and for accessing the value. Dunno if this is relevant performance wise though. (probably not. We are dealing with file system access here..)
- Did you check potential influence on the marshalling? The struct is still blittable but we should make sure that no marshalling is needed during PInvoke (as for the plain system value types we use currently).
- We should also check if this proposal violates the 'no code' rule of HDF.PInvoke. This would be less of an issue for me personally. But in general this rule is useful in order to keep a clear focus on the project and to seperate our efforts from potential 3rd party additions.
- It is a rather big change and breaks compatibility with existing code.
- The advantage in terms of intellisense support might become superseded if a user wants to actually do something with the
hid
anderr
values. Built-in system types are easier here. On the structs we need additional code to unwrap the value. This can be simplified again, but ...
Writing CLS-Compliant Code The features you use in the definitions of your private classes, in the definitions of private methods on public classes, and in local variables do not have to follow the CLS rules. You can also use any language features you want in the code that implements your class and still have a CLS-compliant component.
So if the struct wraps UIntPtr completely, it wouldn't hurt CLS rules. On the other hand, is there any .Net language/runtime not supporting UIntPtr?
@hokb
-
Member Access: Since it's a struct with a field, I'm not sure if this really changes performance.
-
Marshalling: Valid point. No havn't checked yet. But hope it's blitted directly.
-
No Code: Yeah, but I don't know how one is able to achieve similar results without wrapping the whole library. I think the typesaftyness is worth it (e.g. did you ever accidentally passed a herr_t into a hid_t parameter? With this setup you will get a compile-time error)
-
Compatibility: Admitted, it is a big change. But in the UnitTests projects I removed just the usings and fixed the already existing typing errors. (e.g. there were quite a few
size_t size = (ssize_t)0;
andIntPtr size = (size_t)0;
and the like) -
IntelliSense vs EaseOfUse: Valid point for the
[h][s]size_t
types. Though it would be possible to extend the structs to simplify the usage. I think most of the other types should be treated as opaque types in the first place. E.g.- changing
hid_t
values doesn't sound like a good idea (e.g. adding) -
herr_t
is actually a boolean -
hbool_t
could implicitly convert into and from bool (much more expressive) - similar with
htri_t
-
haddr_t
is only used as a parameter inH5O.open_by_addr
AND the value SHOULD be retrieved by the library (will forbid a lot of possible errors)
ADDED:
[h][s]size_t
s have two states: indicating an error or carrying a positive integer. - changing
In my own wrapper I went even further and also separated the hid_t
into different classes:
- Attribute Ids: H5A
- PropertyList Ids (can be splitted into the separate PropertyList classes): H5P
- File Ids: H5F
- Group Ids: H5G
- Dataset Ids: H5D
- Datatype Ids (can be splitted into the separate types: int, string, ..): H5T
- Object Ids (Group, Dataset, Datatype): H5O
- Location Ids (File, Group): H5L
- Dataspace Ids: H5S
Roughly speaking: most of the library modules expose their own hid_t
type. Some of them are just classes (e.g. H5O, H5L, H5I).
This adds even more compile-time checks (typesaftyness). E.g. you can't tread an object-id as a dataset-id without an explicit cast. To me every hid_t
feels like returning System.Object
instead of the real type. It C's way, but not C#'s.
I don't say, the library should go down this line, but there are a lot of possibilities which may or may not improve the acceptance of HDF5 in the .Net community :)
PS: Omg, again a really long post.
Generally I really think the "No Code" rule is good, with the exception of types. The problem is, that .Net's capabilities to extend the behavior of already existing types are quite limited.
just stumbled upon a new hurdle with using
s while preparing a little performance:
If someone tries to build a library that's based on HDF.PInvoke, the library has explicitly to support both versions and offer two verions (like HDF.PInvoke does). With a struct
(at least for hid_t
) you wouldn't have the problem. If your library only uses functionality that's available both in 1.8 and 1.10, the user of the library can decide which one to use.
the user of the library can decide which one to use.
Except that one still must decide which HDF.PInvoke
to compile, no? So we wouldn't get rid of the HDF_1_10
compile switch in HDF.PInvoke? This will not buy us a PInvoke
assembly which is able to work with 1.8. and 1.10 dlls at the same time, right?
Except that one still must decide which HDF.PInvoke to compile, no?
Let me clear out my example: There is a library called FancyHDF which (of course) depends on the HDF.PInvoke NuGet package. And you decides to build an app that uses FancyHDF. Inside the FancyHDF NuGet package is just one binary, but you can still decide which version of HDF.PInvoke you like to use. This takes a lot of burden from the library writers.
If you try to prevent such scenarios, I would suggest to make two separate NuGet packages. One for 1.8 and one for 1.10.
I like your Typology.cs
attempt a lot - it is beautyful and elegant. But for me the following concerns remain:
- I see the home for things like that in a higher level than
HDF.PInvoke
is targeting. This lib ought to be a thin wrapper around the native dll. While it is clear that there is a lot of potential for increasing the convenience to the user ('stronger typed' hid structs being only one of them) it would add a new level of abstraction to the underlying native hids. They are what they are: integers of varying (with versions) size. So this proposal violates the "no code rule". And it does so on a level which a user is unable to circumvent. Other than for convenience functions and - classes, and -extensions (which can be used or not) a user would have to always follow us on this path if she wants to useHDF.PInvoke
at all. - The proof that no additional cost is added to marshal the structs [in and out] is outstanding.
I am in a somewhat ambivalent position here, since additions like this could likely make our (ILNumerics) life easier. (ILNumerics is maintaining a higher level HDF5 lib based on HDF.PInvoke.) On the other hand the decision to only add such code to HDF.PInvoke
which is really needed to make it work makes a whole lot of sense to me too.
Inside the FancyHDF NuGet package is just one binary, but you can still decide which version of HDF.PInvoke you like to use.
I don't get it yet: do you mean that the struct attempt would allow one to target the nuget package but use versions 1.8 and 1.10 of the native dlls interchangably?
The struct would be fetched from the other library at runtime. E.g. you can try to do a sizeof(hid_t)
in different assembly. The compiler will complain that the size isn't statically resolveable (so I added a hid_t.Size
field ;))
It's really referencing the type, not the memory-layout.
No luck in getting your point yet :) IMO the nuget package is for one specific HDF version only. Users wanting to use another version need to recompile from code and use the other compile switch. This is valid for both: using your structs attempt or not.
Correct?
IMO the nuget package is for one specific HDF version only.
Correct. And what if someone decides to upgrade from 1.8 to 1.10 and still using FancyHDF? Without struct, he also has to update FancyHDF (which also requires the FancyHDF's author to both versions on NuGet). With structs you just have to upgrade HDF.PInvoke and FancyHDF will still work fine with the new version.
Since the 1.10's file-format isn't backwards compatible to 1.8, I think this is a viable scenario. (e.g. at AHM we're still using 1.8).
PS: I haven't tested this yet, but suppose it would work. gg
Regarding your first concern:
- Not increasing the user's convenience in this layer also means: everybody has to do it themselves -> a lot of overall work (for example: I assume you haven't had the struct idea yourself yet?)
- You call it adding abstractions, I call it preserving semantics/concepts ;-P Of course, it's both.
- Actually the "no code"-rule is already partially violated, otherwise there were no changes required to compile for .Net 2.0 #102.
- After all it's a C-API. They're not known to be fancy and type-safe. (e.g. how much have you wasted already to find the incorrect id-type passed into the wrong function in OpenTK, they're all integers, and every id-type starts at 0)
But I admit, if we do this, there has to go some more thoughts into the implementation. I wonder if other language bindings added typesaftyness?
Just to mention it: Having custom types also allows to do stuff like return herr_t.Ok;
I made a little MarshallingTest Project and as it turns out (at least on my machine): with x86 the structs are little bit slower and with x64 a little bit faster. This is consistent on my machine.
I tried to improve the test a little bit. Now the struct is obviously slower with x86 and x64.
But I would call this negligible slower: a third second slower after 26 million calls to
H5Tcreate
and H5Tclose
.
The overhead likely comes from the conversion of the size_t
struct input parameter.
Ah right. Fixed that :) It's still drifting, but this is more likely to be caused by the test setup.
In my own wrapper I went even further and also separated the hid_t into different classes:
And here comes an example:
Some types to get the idea
struct hid_t
{
#if HDF5_VER1_10
public System.Int64 Value;
#else
public System.Int32 Value;
#endif
}
struct H5I
{
public readonly hid_t Value;
public H5I( hid_t value ) { Value = value; }
}
struct H5L
{
public readonly hid_t Value;
public H5L( hid_t value ) { Value = value; }
public H5L( H5I id ) { Value = id.Value; }
public static implicit operator H5I( H5L id ) { return H5I( id.Value ); }
}
struct H5O
{
public readonly hid_t Value;
public H5O( hid_t value ) { Value = value; }
public H5O( H5I id ) { Value = id.Value; }
public static implicit operator H5I( H5O id ) { return H5I( id.Value ); }
}
public H5F
{
public readonly hid_t Value;
public H5F( hid_t value ) { Value = value; }
public H5F( H5I id ) { Value = id.Value; }
public static implicit operator H5I( H5F id ) { return H5I( id.Value ); }
public H5F( H5L id ) { Value = id.Value; }
public static implicit operator H5L( H5F id ) { return H5L( id.Value ); }
}
public H5G
{
public readonly hid_t Value;
public H5G( hid_t value ) { Value = value; }
public H5G( H5I id ) { Value = id.Value; }
public static implicit operator H5I( H5G id ) { return H5I( id.Value ); }
public H5G( H5L id ) { Value = id.Value; }
public static implicit operator H5L( H5G id ) { return H5L( id.Value ); }
public H5G( H5O id ) { Value = id.Value; }
public static implicit operator H5O( H5G id ) { return H5O( id.Value ); }
}
public H5D
{
public readonly hid_t Value;
public H5D( hid_t value ) { Value = value; }
public H5D( H5I id ) { Value = id.Value; }
public static implicit operator H5I( H5D id ) { return H5I( id.Value ); }
public H5D( H5O id ) { Value = id.Value; }
public static implicit operator H5O( H5D id ) { return H5O( id.Value ); }
}
And some PInvokes
[DllImport(...)]
public extern static H5P H5F_get_access_plist(
H5F file_id );
[DllImport(...)]
public extern static H5G H5G_open(
H5L loc_id,
string name,
H5P gapl_id = default(H5P) );
[DllImport(...)]
public static extern H5D H5D_create(
H5L loc_id,
byte[] name,
H5T type_id,
H5S space_id,
H5P lcpl_id = default(H5P),
H5P dcpl_id = default(H5P),
H5P dapl_id = default(H5P) );
What do you think?
I will not prepare a branch containing these ideas until you really consider doing it, since it's quite some work to do ;)
@JanWosnitza I've thought of implementing something similar here as well, and agree that it would be a good way to make this API easier to use and more type-safe while adhering to the "no-code" policy. The run-time overhead should be negligible (if there's any at all) compared to using the primitive types themselves, and the type-safety makes it easier to implement a higher-level, more .NET-friendly wrapper on top of HDF.PInvoke.
@hokb This is a big change and may* break backwards-compatibility, but it's the best long-term approach. HDF.PInvoke hasn't been available for that long yet, so IMO it's better to make this change once sooner rather than later. It will be such an improvement over the existing method of having to copy all of these type aliases into code written against HDF.PInvoke that I think anyone using this library will be understanding of the breakage.
- It depends on what you mean by "backwards-compatibility". It should be possible to implement this in a way that will allow people to more-or-less just recompile their code (perhaps with some very minor tweaks) against a new version of this library with the structs. For anyone using e.g. .NET assembly redirection, they're just out of luck and will have to stay on an older version of the library or recompile their code against a newer version of this library; that scenario seems like it should be fairly rare though.
If there's anything I can do to help out on this, please let me know -- I'm happy to collaborate on this work to get it done.
@gheber Having written both C/C++ and C# code against the HDF5 APIs, I think this proposal would be extremely helpful both to help prevent bugs and also makes it a bit easier for newcomers to start using HDF5 (since they'll get some help from the compiler and IDE hints if they make simple mistakes). What do you think? Will you accept this if it's implemented by @JanWosnitza and/or me?