Improve String handling with size parameter
Currently string arrays with a size parameter are handled automatically by the marshaller. This is probably not the right way to do it as the marshaller can't know how to handle / free the memory according to the transfer annotations (full / none / container) or can’t differentiate between utf8 / platform strings.
when fixing this care for nullable arrays, too.
This code should not throw an exception in OnCommandLine:
var application = Gtk.Application.New("org.gir.core", Gio.ApplicationFlags.HandlesCommandLine);
application.OnActivate += (sender, args) =>
{
var window = Gtk.ApplicationWindow.New((Gtk.Application) sender);
window.Title = "Gtk4 Window";
window.SetDefaultSize(300, 300);
window.Show();
};
application.OnCommandLine += (sender, signalArgs) =>
{
var a = signalArgs.CommandLine.GetArguments(out _);
return 0;
};
return application.RunWithSynchronizationContext(new [] { "test.pdf" });
- [ ] Verify if https://github.com/gircore/gir.core/blob/main/src/Generation/GirLoader/Output/TypeReferenceFactory.cs#L65 is missing the case that there is an "optional size parameter" which would mean that the array must be zero terminated, too.
- [ ] Solve for utf8-array
- [ ] Solve for platform-array
Yes, of course. This will cause the application to crash:
[DllImport("Gio", EntryPoint = "g_application_command_line_get_arguments")]
[Version("2.28")]
public static extern string[] GetArguments(nint cmdline, out int argc);
I do not have much experience in these matters. But I think it is impossible to marshal an array directly from unmanaged to managed code without knowing the size of the array first.
I think the trick is to set the native method to return the pointer directly. Then, thanks to the argc parameter, we create a managed array containing the pointers to the unmanaged strings. Finally, we marshal these strings into a string[] object.
This seems to work perfectly:
using System.Runtime.InteropServices;
[DllImport("libgio-2.0.so.0", EntryPoint = "g_application_command_line_get_arguments")]
static extern nint GetArguments(nint cmdline, out int argc);
static string[] GetArgs(nint cmdline, out int argc)
{
var pointer = GetArguments(cmdline, out argc);
var argv = new nint[argc]; // Managed IntPtr array
var resultGetArguments = new string[argc];
// Copies from an unmanaged memory pointer to a managed array.
Marshal.Copy(pointer, argv, 0, argc);
for (int i = 0; i < argc; i++)
{
resultGetArguments[i] = Marshal.PtrToStringAnsi(argv[i])!;
Marshal.FreeCoTaskMem(argv[i]);
}
Marshal.FreeCoTaskMem(pointer);
return resultGetArguments;
}
var application = Gtk.Application.New("org.gir.core", Gio.ApplicationFlags.HandlesCommandLine);
application.OnActivate += (sender, args) =>
{
var window = Gtk.ApplicationWindow.New((Gtk.Application) sender);
window.Title = "Gtk4 Window";
window.SetDefaultSize(300, 300);
window.Show();
};
application.OnCommandLine += (sender, signalArgs) =>
{
//var a = signalArgs.CommandLine.GetArguments(out _);
var a = GetArgs(signalArgs.CommandLine.Handle.DangerousGetHandle(), out _);
foreach (string s in a)
Console.WriteLine(s);
return 0;
};
return application.RunWithSynchronizationContext(["Hello Wolrd!", "Hola Mundo!", "Hallo Welt!"]);
After taking a closer look at the project and how bindings with GIR-based libraries are implemented, I believe that changing the approach would be the most practical solution. Argument why:
Currently, when the API considers that the value returned by a method (speaking of return values, not parameters) is a size-based array, it simply generates code like this:
public extern managedType[] NativeMethod(params)
For example, for the g_application_command_line_get_arguments method, the following code is generated inside the API:
// Gio-2.0/Internal/ApplicationCommandLine.Methods.Generated.cs:59:
[DllImport(ImportResolver.Library, EntryPoint = "g_application_command_line_get_arguments")]
public static extern string[] GetArguments(IntPtr cmdline, out int argc);
In reality, this type of code is not very useful because, although it compiles, at runtime we will encounter an error that breaks the application if we try to call it.
Another issue is that GIR definitions are often not entirely correct. If we analyze the method g_application_command_line_get_arguments, we see that the return value definition contains the following:
<array length="0"
zero-terminated="0"
c:type="gchar**">
<type name="filename" />
</array>
In other words, it supposedly indicates that we are dealing with an array that is not NULL-terminated (zero-terminated="false") and that the first parameter (index 0) provides the size of the array. However, in reality, both assumptions are false. If we look at the native declaration:
https://docs.gtk.org/gio/method.ApplicationCommandLine.get_arguments.html
gchar**
g_application_command_line_get_arguments (
GApplicationCommandLine* cmdline,
int* argc
)
// Description
// ...
// The return value is NULL-terminated and should be freed using g_strfreev().
In this ecosystem, arrays used as return values are almost always NULL-terminated. This is not an absolute requirement, but it is the convention. Therefore, in 99% of the cases it works in this way.
For example, examining all the cases where Gir.Core currently generates public static extern string[]:
[DllImport(ImportResolver.Library, EntryPoint = "g_application_command_line_get_arguments")]
public static extern string[] GetArguments(IntPtr cmdline, out int argc);
[DllImport(ImportResolver.Library, EntryPoint = "g_bookmark_file_get_applications")]
public static extern string[] GetApplications(GLib.Internal.BookmarkFileHandle bookmark, GLib.Internal.NonNullableUtf8StringHandle uri, out nuint length, out GLib.Internal.ErrorOwnedHandle error);
[DllImport(ImportResolver.Library, EntryPoint = "g_bookmark_file_get_groups")]
public static extern string[] GetGroups(GLib.Internal.BookmarkFileHandle bookmark, GLib.Internal.NonNullableUtf8StringHandle uri, out nuint length, out GLib.Internal.ErrorOwnedHandle error);
[DllImport(ImportResolver.Library, EntryPoint = "g_bookmark_file_get_uris")]
public static extern string[] GetUris(GLib.Internal.BookmarkFileHandle bookmark, out nuint length);
[DllImport(ImportResolver.Library, EntryPoint = "hb_blob_get_data")]
public static extern string[] BlobGetData(HarfBuzz.Internal.blob_tHandle blob, out uint length);
[DllImport(ImportResolver.Library, EntryPoint = "hb_blob_get_data_writable")]
public static extern string[] BlobGetDataWritable(HarfBuzz.Internal.blob_tHandle blob, out uint length);
[DllImport(ImportResolver.Library, EntryPoint = "g_variant_dup_bytestring_array")]
public static extern string[] DupBytestringArray(GLib.Internal.VariantHandle value, out nuint length);
[DllImport(ImportResolver.Library, EntryPoint = "g_variant_get_bytestring_array")]
public static extern string[] GetBytestringArray(GLib.Internal.VariantHandle value, out nuint length);
If we review documentation or source code for these native methods, we see they are indicated as NULL-terminated arrays.
Therefore, the best approach would be to modify our strategy and assume that these arrays are always NULL-terminated. For example, in the case of PlatformStringArray:
// Generator/Renderer/Internal/ReturnType/Converter/PlatfromStringArray.cs
public RenderableReturnType Convert(GirModel.ReturnType returnType)
{
// var arrayType = returnType.AnyType.AsT1;
// if (arrayType.IsZeroTerminated)
return NullTerminatedArray(returnType);
// if (arrayType.Length is not null)
// return SizeBasedArray();
// throw new Exception("Unknown kind of array");
}
[DllImport(ImportResolver.Library, EntryPoint = "g_application_command_line_get_arguments")]
public static extern GLib.Internal.PlatformStringArrayNullTerminatedOwnedHandle GetArguments(IntPtr cmdline, out int argc);
// Generator/Renderer/Public/ReturnTypeToManagedExpression/Converter/PlatformStringArray.cs
public void Initialize(ReturnTypeToManagedData data, IEnumerable<ParameterToNativeData> _)
{
data.SetExpression(fromVariableName =>
{
var returnType = data.ReturnType;
// var arrayType = returnType.AnyType.AsT1;
// if (arrayType.IsZeroTerminated)
return NullTerminatedArray(returnType, fromVariableName);
// if (arrayType.Length is not null)
// return SizeBasedArray(returnType, fromVariableName);
// throw new Exception("Unknown kind of array");
});
}
public string[] GetArguments(out int argc)
{
var resultGetArguments = Gio.Internal.ApplicationCommandLine.GetArguments(base.Handle.DangerousGetHandle(), out argc);
return resultGetArguments.ConvertToStringArray() ?? throw new System.Exception("Unexpected null value");
}
We can test the original example and see how the ConvertToStringArray function works without problems, converting unmanaged to managed code because the array is NULL-terminated.
We would have to do the same for the other cases as well. I’m not sure whether this should be done directly in the converters or by modifying another part of the API generator, but in general, the idea is to assume that all these returned arrays are NULL-terminated and implement the conversion accordingly.
Thank you for your investigation. I will take a look at it in the coming days / weeks. This topic is quite complex and needs it's time.
This specific code from GirCore for size based returned string arrays is from it's very beginning. This needs to be reworked completely. I would recommend not putting more effort in this as I need to wrap my head around this first. Guiding you through this stuff would probably be pretty complex. So starting with less intensive stuff would be a better start to get you going in case you are interested.
As it sounds like you found your way around the code base, it would be very appreciated if you are interested in fixing some warnings which get emitted if the generated libraries are generated.
A general note: Just from my experience working on this project for several years: Upstream is almost never wrong. It may not look plausible from a C# point of view but there is always some explanation why things make sense. (GObject is under development since 2002.) Ignoring the upstream data may work in some/most cases but just because the API is as vast as it is there are always corner cases which will break. So the only solution is to respect upstream and make things right even if it is hard on C# side.
OK, I get it, no problem.
As for your general note, I think you misunderstood me. It's probably my fault for not making myself clear. I apologize for that.
I didn’t mean to imply that anything is wrong in Upstream, nor in GObject (which, in fact, was born with GTK+ (GTKObject) and dates back to before 2002), or anywhere else. I wouldn’t want some member of the GTK/Gnome team to show up here and kick my ass :scream: :joy:. I was simply referring to the idea that this way of determining the array type in Gir.Core loader doesn’t seem right:
https://github.com/gircore/gir.core/blob/main/src/Generation/GirLoader/Output/TypeReferenceFactory.cs#L65
In those cases where we find the "zero-terminated=0" attribute, it's really not always an indication that the array is not a NULL-terminated array, just as indicated in the documentation. For example:
https://docs.gtk.org/gio/method.ApplicationCommandLine.get_arguments.html
Another question is that we can handle as size based array, since we have an attribute that indicates the length of the array.
But as you say, it seems you still have a lot of work ahead of you. And probably me can’t contribute much.
Really, thank you very much for your efforts on this project :heart:. If I see something where I think I can be helpful, I will try to pitch in my two cents.
Your feedback is always welcome. If you want to get your hands dirty feel free to come into the matrix chat to find a topic you like to work on.
I agree with you comment that "zero-terminated=0" not automatically means that there is no NULL at the end of the array.
Can you explain in which cases the GirLoader would behave wrong / how would you change the code and why?
From my understanding: If some API is marked explicitly as zero terminated it will be zero terminated. If it is not marked as such but in addition there is no other information which allows to determine it's size it is implicitly zero terminated. The comment implies that this is a gobject convention. I can't remember this exactly but from the comment I would think that I talked this topic through with the gnome guys.
In regard to your sample, lets check the gir file (I removed the "doc" tags):
<method name="get_arguments"
c:identifier="g_application_command_line_get_arguments"
version="2.28">
<return-value transfer-ownership="full">
<array length="0" zero-terminated="0" c:type="gchar**">
<type name="filename"/>
</array>
</return-value>
<parameters>
<instance-parameter name="cmdline" transfer-ownership="none">
<type name="ApplicationCommandLine"
c:type="GApplicationCommandLine*"/>
</instance-parameter>
<parameter name="argc"
direction="out"
caller-allocates="0"
transfer-ownership="full"
optional="1"
allow-none="1">
<type name="gint" c:type="int*"/>
</parameter>
</parameters>
</method>
It is clearly marked as a size based array of strings, which in addition is zero terminated which is not indicated in the xml. Why should it be a problem to treat the array as size based IntPtr or a similar low level datatype? Such a string would probably be freed string by string and in the end, the array of pointers would be freed additionally as the ownership is transfered to GirCore. There might even be some GLib function available which does exactly this. To access the data there would be some conversion to bring the native data into the managed dotnet world.
The problem is that the old code uses MarshalAs attributes or relies on dotnet automation which is assuming / doing the wrong thing. If the behavior is implemented on a lower level (like indicated in the previous paragraph) I think things should work like expected.
One thing to consider is if the argc parameter will be optional from a GirCore point of view.
As it is marked as optional this would mean that the array must be zero terminated in the end (which is true in this case). This is probably missing in the fallback condition of the GirLoader and should be added. I added this in the issue description.
Thanks for your input because otherwise I would probably not have thought about this case.
From my understanding: If some API is marked explicitly as zero terminated it will be zero terminated. If it is not marked as such but in addition there is no other information which allows to determine it's size it is implicitly zero terminated. The comment implies that this is a gobject convention. I can't remember this exactly but from the comment I would think that I talked this topic through with the gnome guys.
OK, so it seems that you, me, and the Gnome guys are all saying the same thing.
Why should it be a problem to treat the array as size based IntPtr or a similar low level datatype? Such a string would probably be freed string by string and in the end, the array of pointers would be freed additionally as the ownership is transfered to GirCore. There might even be some GLib function available which does exactly this. To access the data there would be some conversion to bring the native data into the managed dotnet world.
No problem, in fact it is the proof of concept I made in my first post in this thread. But Gir.Core just not work in this way today. That is why, until a better solution comes along, I thought it might be a good idea to have something that works most of the time, instead of having a high failure rate, since we really would not have to change any existing code.
which in addition is zero terminated which is not indicated in the xml.
Yes, but the documentation is very clear that the return value is a NULL-terminated array. Same in the other cases for Gio library:
- g_application_command_line_get_arguments -> The return value is NULL-terminated...
- g_bookmark_file_get_applications -> A newly allocated NULL-terminated array of strings.
- g_bookmark_file_get_groups -> A newly allocated NULL-terminated array of group names.
- g_bookmark_file_get_uris -> A newly allocated NULL-terminated array of strings.
- g_variant_dup_bytestring_array -> If length is non-NULL then the number of elements in the result is stored there. In any case, the resulting array will be NULL-terminated.
- g_variant_get_bytestring_array -> If length is non-NULL then the number of elements in the result is stored there. In any case, the resulting array will be NULL-terminated.
It is clearly marked as a size based array of strings
This comment is just based on my intuition. But maybe we are wrong in this assumption. All these methods really return a pointer to an array (gchar**), not an array directly. Maybe this is why they are marked with "NULL-terminated=0" attribute in XML. The devil is almost always in the details.
The problem is that the old code uses MarshalAs attributes or relies on dotnet automation which is assuming / doing the wrong thing. If the behavior is implemented on a lower level (like indicated in the previous paragraph) I think things should work like expected.
I agree with that. But right now, the generated code for these cases is just completely unusable. Gir.Core actually generates this for g_application_command_line_get_arguments, and the same for other cases:
[DllImport(ImportResolver.Library, EntryPoint = "g_application_command_line_get_arguments")]
public static extern string[] GetArguments(IntPtr cmdline, out int argc);
There is no marshalling or automatic conversion here. This will simply cause the application to break if you try to call it at runtime.
There is no marshalling or automatic conversion here. This will simply cause the application to break if you try to call it at runtime.
This issue is a planned for 0.7.0 which is the next version which will be released. So there is no need for a temporary solution.