HDF.PInvoke icon indicating copy to clipboard operation
HDF.PInvoke copied to clipboard

Emulate C++'s typedefs with structs

Open JanWosnitza opened this issue 8 years ago • 29 comments

At the current state e.g. the functions H5F.create( .. ) and H5F.get_filesize( .. ) both return Int32 even though create returns an actual id and get_filesize only indicates if there was an error. So the real meanings are lost in C# when using IntelliSense.

image image

I suggest to wrap the values into structs to create marshable wrapper types to preserve the semantics that are carried by aliases (C++ typedefs).

So instead of adding a lot usings on top of every file:

using herr_t = System.Int32;
#if HDF5_VER1_10
using hid_t = System.Int64;
#else
using hid_t = System.Int32;
#endif

I would add these types ONCE into the library:

public struct herr_t
{
    public System.Int32 Value;
    public static implicit operator System.Int32(herr_t value) => value.Value;
}

public struct hid_t
{
#if HDF5_VER1_10
    public System.Int64 Value;
    public static implicit operator System.Int64( hid_t value ) => value.Value;
#else
    public System.Int32 Value;
    public static implicit operator System.Int32(hid_t value) => value.Value;
#endif
}

This allows would lead to much more readable IntelliSense hints and enables the use of ad-hoc polymorphism like:

public static class HdfPInvokeHelpers
{
    public static void ThrowOnError(this herr_t value)
    {
        if (value < 0)
            throw new Exception(H5EHelper.GetErrorStack().ToString());
    }

    public static hid_t ThrowOnError(this hid_t value)
    {
        if (value < 0)
            throw new Exception(H5EHelper.GetErrorStack().ToString());
        return value;
    }
}

class Program
{
    static void Main(string[] args)
    {
        var file = H5F.create("file.h5", H5F.ACC_RDONLY, H5P.DEFAULT, H5P.DEFAULT).ThrowOnError();
        ulong size;
        H5F.get_filesize(file, ref size).ThrowOnError();
    }
}

Update

Missing cross-define for hid_t added.

JanWosnitza avatar Oct 19 '16 16:10 JanWosnitza

I started a branch with the suggested changes. Though it's not done yet.

JanWosnitza avatar Oct 20 '16 14:10 JanWosnitza

@hokb I noticed, that both size_t and ssize_t are mapped to System.IntPtr in every file. Is this on purpose? As far as I know the first s in ssize_tmeans "signed" since size_t is unsinged. So a direct mapping of C++ would be this:

using size_t = System.UIntPtr;
using ssize_t = System.IntPtr;

JanWosnitza avatar Oct 20 '16 14:10 JanWosnitza

I believe the main motivation was that UIntPtr is not CLS-compliant. Do we care?

gheber avatar Oct 20 '16 15:10 gheber

Thanks @JanWosnitza. Great effort and an interesting attempt to ease the handling of our id types. Just some points to consider:

  • Wrapping the id / err values into a struct involves additional (explicit and implicit) calls for storing and for accessing the value. Dunno if this is relevant performance wise though. (probably not. We are dealing with file system access here..)
  • Did you check potential influence on the marshalling? The struct is still blittable but we should make sure that no marshalling is needed during PInvoke (as for the plain system value types we use currently).
  • We should also check if this proposal violates the 'no code' rule of HDF.PInvoke. This would be less of an issue for me personally. But in general this rule is useful in order to keep a clear focus on the project and to seperate our efforts from potential 3rd party additions.
  • It is a rather big change and breaks compatibility with existing code.
  • The advantage in terms of intellisense support might become superseded if a user wants to actually do something with the hid and err values. Built-in system types are easier here. On the structs we need additional code to unwrap the value. This can be simplified again, but ...

hokb avatar Oct 20 '16 15:10 hokb

Writing CLS-Compliant Code The features you use in the definitions of your private classes, in the definitions of private methods on public classes, and in local variables do not have to follow the CLS rules. You can also use any language features you want in the code that implements your class and still have a CLS-compliant component.

So if the struct wraps UIntPtr completely, it wouldn't hurt CLS rules. On the other hand, is there any .Net language/runtime not supporting UIntPtr?

JanWosnitza avatar Oct 20 '16 16:10 JanWosnitza

@hokb

  1. Member Access: Since it's a struct with a field, I'm not sure if this really changes performance.

  2. Marshalling: Valid point. No havn't checked yet. But hope it's blitted directly.

  3. No Code: Yeah, but I don't know how one is able to achieve similar results without wrapping the whole library. I think the typesaftyness is worth it (e.g. did you ever accidentally passed a herr_t into a hid_t parameter? With this setup you will get a compile-time error)

  4. Compatibility: Admitted, it is a big change. But in the UnitTests projects I removed just the usings and fixed the already existing typing errors. (e.g. there were quite a few size_t size = (ssize_t)0; and IntPtr size = (size_t)0; and the like)

  5. IntelliSense vs EaseOfUse: Valid point for the [h][s]size_t types. Though it would be possible to extend the structs to simplify the usage. I think most of the other types should be treated as opaque types in the first place. E.g.

    • changing hid_t values doesn't sound like a good idea (e.g. adding)
    • herr_t is actually a boolean
    • hbool_t could implicitly convert into and from bool (much more expressive)
    • similar with htri_t
    • haddr_t is only used as a parameter in H5O.open_by_addr AND the value SHOULD be retrieved by the library (will forbid a lot of possible errors)

    ADDED: [h][s]size_ts have two states: indicating an error or carrying a positive integer.

In my own wrapper I went even further and also separated the hid_t into different classes:

  • Attribute Ids: H5A
  • PropertyList Ids (can be splitted into the separate PropertyList classes): H5P
  • File Ids: H5F
  • Group Ids: H5G
  • Dataset Ids: H5D
  • Datatype Ids (can be splitted into the separate types: int, string, ..): H5T
  • Object Ids (Group, Dataset, Datatype): H5O
  • Location Ids (File, Group): H5L
  • Dataspace Ids: H5S

Roughly speaking: most of the library modules expose their own hid_t type. Some of them are just classes (e.g. H5O, H5L, H5I).

This adds even more compile-time checks (typesaftyness). E.g. you can't tread an object-id as a dataset-id without an explicit cast. To me every hid_t feels like returning System.Object instead of the real type. It C's way, but not C#'s. I don't say, the library should go down this line, but there are a lot of possibilities which may or may not improve the acceptance of HDF5 in the .Net community :)

PS: Omg, again a really long post.

JanWosnitza avatar Oct 20 '16 16:10 JanWosnitza

Generally I really think the "No Code" rule is good, with the exception of types. The problem is, that .Net's capabilities to extend the behavior of already existing types are quite limited.

JanWosnitza avatar Oct 20 '16 17:10 JanWosnitza

just stumbled upon a new hurdle with usings while preparing a little performance:

If someone tries to build a library that's based on HDF.PInvoke, the library has explicitly to support both versions and offer two verions (like HDF.PInvoke does). With a struct (at least for hid_t) you wouldn't have the problem. If your library only uses functionality that's available both in 1.8 and 1.10, the user of the library can decide which one to use.

JanWosnitza avatar Oct 21 '16 06:10 JanWosnitza

the user of the library can decide which one to use.

Except that one still must decide which HDF.PInvoke to compile, no? So we wouldn't get rid of the HDF_1_10 compile switch in HDF.PInvoke? This will not buy us a PInvoke assembly which is able to work with 1.8. and 1.10 dlls at the same time, right?

hokb avatar Oct 22 '16 17:10 hokb

Except that one still must decide which HDF.PInvoke to compile, no?

Let me clear out my example: There is a library called FancyHDF which (of course) depends on the HDF.PInvoke NuGet package. And you decides to build an app that uses FancyHDF. Inside the FancyHDF NuGet package is just one binary, but you can still decide which version of HDF.PInvoke you like to use. This takes a lot of burden from the library writers.

If you try to prevent such scenarios, I would suggest to make two separate NuGet packages. One for 1.8 and one for 1.10.

JanWosnitza avatar Oct 22 '16 17:10 JanWosnitza

I like your Typology.cs attempt a lot - it is beautyful and elegant. But for me the following concerns remain:

  • I see the home for things like that in a higher level than HDF.PInvoke is targeting. This lib ought to be a thin wrapper around the native dll. While it is clear that there is a lot of potential for increasing the convenience to the user ('stronger typed' hid structs being only one of them) it would add a new level of abstraction to the underlying native hids. They are what they are: integers of varying (with versions) size. So this proposal violates the "no code rule". And it does so on a level which a user is unable to circumvent. Other than for convenience functions and - classes, and -extensions (which can be used or not) a user would have to always follow us on this path if she wants to use HDF.PInvoke at all.
  • The proof that no additional cost is added to marshal the structs [in and out] is outstanding.

I am in a somewhat ambivalent position here, since additions like this could likely make our (ILNumerics) life easier. (ILNumerics is maintaining a higher level HDF5 lib based on HDF.PInvoke.) On the other hand the decision to only add such code to HDF.PInvoke which is really needed to make it work makes a whole lot of sense to me too.

hokb avatar Oct 22 '16 18:10 hokb

Inside the FancyHDF NuGet package is just one binary, but you can still decide which version of HDF.PInvoke you like to use.

I don't get it yet: do you mean that the struct attempt would allow one to target the nuget package but use versions 1.8 and 1.10 of the native dlls interchangably?

hokb avatar Oct 22 '16 18:10 hokb

The struct would be fetched from the other library at runtime. E.g. you can try to do a sizeof(hid_t) in different assembly. The compiler will complain that the size isn't statically resolveable (so I added a hid_t.Size field ;))

JanWosnitza avatar Oct 22 '16 18:10 JanWosnitza

It's really referencing the type, not the memory-layout.

JanWosnitza avatar Oct 22 '16 18:10 JanWosnitza

No luck in getting your point yet :) IMO the nuget package is for one specific HDF version only. Users wanting to use another version need to recompile from code and use the other compile switch. This is valid for both: using your structs attempt or not.

Correct?

hokb avatar Oct 22 '16 18:10 hokb

IMO the nuget package is for one specific HDF version only.

Correct. And what if someone decides to upgrade from 1.8 to 1.10 and still using FancyHDF? Without struct, he also has to update FancyHDF (which also requires the FancyHDF's author to both versions on NuGet). With structs you just have to upgrade HDF.PInvoke and FancyHDF will still work fine with the new version.

JanWosnitza avatar Oct 22 '16 18:10 JanWosnitza

Since the 1.10's file-format isn't backwards compatible to 1.8, I think this is a viable scenario. (e.g. at AHM we're still using 1.8).

JanWosnitza avatar Oct 22 '16 18:10 JanWosnitza

PS: I haven't tested this yet, but suppose it would work. gg

JanWosnitza avatar Oct 22 '16 18:10 JanWosnitza

Regarding your first concern:

  • Not increasing the user's convenience in this layer also means: everybody has to do it themselves -> a lot of overall work (for example: I assume you haven't had the struct idea yourself yet?)
  • You call it adding abstractions, I call it preserving semantics/concepts ;-P Of course, it's both.
  • Actually the "no code"-rule is already partially violated, otherwise there were no changes required to compile for .Net 2.0 #102.
  • After all it's a C-API. They're not known to be fancy and type-safe. (e.g. how much have you wasted already to find the incorrect id-type passed into the wrong function in OpenTK, they're all integers, and every id-type starts at 0)

JanWosnitza avatar Oct 22 '16 18:10 JanWosnitza

But I admit, if we do this, there has to go some more thoughts into the implementation. I wonder if other language bindings added typesaftyness?

JanWosnitza avatar Oct 22 '16 18:10 JanWosnitza

Just to mention it: Having custom types also allows to do stuff like return herr_t.Ok;

JanWosnitza avatar Oct 22 '16 19:10 JanWosnitza

I made a little MarshallingTest Project and as it turns out (at least on my machine): with x86 the structs are little bit slower and with x64 a little bit faster. This is consistent on my machine.

JanWosnitza avatar Oct 22 '16 19:10 JanWosnitza

I tried to improve the test a little bit. Now the struct is obviously slower with x86 and x64. image But I would call this negligible slower: a third second slower after 26 million calls to H5Tcreate and H5Tclose.

JanWosnitza avatar Oct 23 '16 07:10 JanWosnitza

The overhead likely comes from the conversion of the size_t struct input parameter.

hokb avatar Oct 23 '16 09:10 hokb

Ah right. Fixed that :) It's still drifting, but this is more likely to be caused by the test setup.

JanWosnitza avatar Oct 24 '16 08:10 JanWosnitza

In my own wrapper I went even further and also separated the hid_t into different classes:

And here comes an example:

Some types to get the idea

struct hid_t
{
#if HDF5_VER1_10
    public System.Int64 Value;
#else
    public System.Int32 Value;
#endif
}

struct H5I
{
    public readonly hid_t Value;
    public H5I( hid_t value ) { Value = value; }
}

struct H5L
{
    public readonly hid_t Value;
    public H5L( hid_t value ) { Value = value; }

    public H5L( H5I id ) { Value = id.Value; }
    public static implicit operator H5I( H5L id ) { return H5I( id.Value ); }
}

struct H5O
{
    public readonly hid_t Value;
    public H5O( hid_t value ) { Value = value; }

    public H5O( H5I id ) { Value = id.Value; }
    public static implicit operator H5I( H5O id ) { return H5I( id.Value ); }
}

public H5F
{
    public readonly hid_t Value;
    public H5F( hid_t value ) { Value = value; }

    public H5F( H5I id ) { Value = id.Value; }
    public static implicit operator H5I( H5F id ) { return H5I( id.Value ); }

    public H5F( H5L id ) { Value = id.Value; }
    public static implicit operator H5L( H5F id ) { return H5L( id.Value ); }
}

public H5G
{
    public readonly hid_t Value;
    public H5G( hid_t value ) { Value = value; }

    public H5G( H5I id ) { Value = id.Value; }
    public static implicit operator H5I( H5G id ) { return H5I( id.Value ); }

    public H5G( H5L id ) { Value = id.Value; }
    public static implicit operator H5L( H5G id ) { return H5L( id.Value ); }

    public H5G( H5O id ) { Value = id.Value; }
    public static implicit operator H5O( H5G id ) { return H5O( id.Value ); }
}

public H5D
{
    public readonly hid_t Value;
    public H5D( hid_t value ) { Value = value; }

    public H5D( H5I id ) { Value = id.Value; }
    public static implicit operator H5I( H5D id ) { return H5I( id.Value ); }

    public H5D( H5O id ) { Value = id.Value; }
    public static implicit operator H5O( H5D id ) { return H5O( id.Value ); }
}

And some PInvokes

[DllImport(...)]
public extern static H5P H5F_get_access_plist(
    H5F file_id );

[DllImport(...)]
public extern static H5G H5G_open(
    H5L loc_id,
    string name,
    H5P gapl_id = default(H5P) );

[DllImport(...)]
public static extern H5D H5D_create(
    H5L loc_id,
    byte[] name,
    H5T type_id,
    H5S space_id,
    H5P lcpl_id = default(H5P),
    H5P dcpl_id = default(H5P),
    H5P dapl_id = default(H5P) );

What do you think?

JanWosnitza avatar Dec 21 '16 09:12 JanWosnitza

I will not prepare a branch containing these ideas until you really consider doing it, since it's quite some work to do ;)

JanWosnitza avatar Dec 21 '16 09:12 JanWosnitza

@JanWosnitza I've thought of implementing something similar here as well, and agree that it would be a good way to make this API easier to use and more type-safe while adhering to the "no-code" policy. The run-time overhead should be negligible (if there's any at all) compared to using the primitive types themselves, and the type-safety makes it easier to implement a higher-level, more .NET-friendly wrapper on top of HDF.PInvoke.

@hokb This is a big change and may* break backwards-compatibility, but it's the best long-term approach. HDF.PInvoke hasn't been available for that long yet, so IMO it's better to make this change once sooner rather than later. It will be such an improvement over the existing method of having to copy all of these type aliases into code written against HDF.PInvoke that I think anyone using this library will be understanding of the breakage.

  • It depends on what you mean by "backwards-compatibility". It should be possible to implement this in a way that will allow people to more-or-less just recompile their code (perhaps with some very minor tweaks) against a new version of this library with the structs. For anyone using e.g. .NET assembly redirection, they're just out of luck and will have to stay on an older version of the library or recompile their code against a newer version of this library; that scenario seems like it should be fairly rare though.

If there's anything I can do to help out on this, please let me know -- I'm happy to collaborate on this work to get it done.

jack-pappas avatar Sep 10 '17 19:09 jack-pappas

@gheber Having written both C/C++ and C# code against the HDF5 APIs, I think this proposal would be extremely helpful both to help prevent bugs and also makes it a bit easier for newcomers to start using HDF5 (since they'll get some help from the compiler and IDE hints if they make simple mistakes). What do you think? Will you accept this if it's implemented by @JanWosnitza and/or me?

jack-pappas avatar Dec 22 '17 19:12 jack-pappas