Dotnet binding fails on windows
Dotnet binding fails on windows, and passed on linux with net8
→ dotnet test .\DotOpenDAL.Tests\
Determining projects to restore...
All projects are up-to-date for restore.
DotOpenDAL -> \opendal\incubator-opendal\bindings\dotnet\DotOpenDAL\bin\Debug\net8.0\DotOpenDAL.dll
DotOpenDAL.Tests -> \opendal\incubator-opendal\bindings\dotnet\DotOpenDAL.Tests\bin\Debug\net8.0\DotOpenD
AL.Tests.dll
Test run for \opendal\incubator-opendal\bindings\dotnet\DotOpenDAL.Tests\bin\Debug\net8.0\DotOpenDAL.Tests.dll (.NETCoreApp,Version=v8.0)
Microsoft (R) Test Execution Command Line Tool Version 17.8.0 (x64)
Copyright (c) Microsoft Corporation. All rights reserved.
Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
[xUnit.net 00:00:00.32] DotOpenDAL.Tests.BlockingOperatorTest.TestReadWrite [FAIL]
Failed DotOpenDAL.Tests.BlockingOperatorTest.TestReadWrite [72 ms]
Error Message:
Assert.NotEqual() Failure
Expected: Not 0
Actual: 0
Stack Trace:
at DotOpenDAL.Tests.BlockingOperatorTest.TestReadWrite() in \opendal\incubator-opendal\bindings\dotnet\DotOpenDAL.Tests\BlockingOperatorTest.cs:line 29
at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)
Failed! - Failed: 1, Passed: 0, Skipped: 0, Total: 1, Duration: < 1 ms - DotOpenDAL.Tests.dll (net8.0)
When passing a C# string to c_char, it's UTF16 internal on windows, and then fail to parse in rust part
Invite @tisonkun join the discussion.
Does dotnet has native types like Vec<u8> or bytes? Maybe we should provide correct API instead string at dotnet side?
Does dotnet has native types like
Vec<u8>or bytes? Maybe we should provide correct API insteadstringat dotnet side?
With a workaround, the example test can be fixed with
public BlockingOperator()
{
var ptr = Marshal.StringToHGlobalAnsi("memory");
Op = blocking_operator_construct(ptr);
}
[DllImport(
"opendal_dotnet",
EntryPoint = "blocking_operator_construct",
CallingConvention = CallingConvention.Cdecl,
CharSet = CharSet.Auto)]
private static extern IntPtr blocking_operator_construct(IntPtr scheme);
But it works for ansi strings only, and for non-ansi (e.g. Simplified Chinese), the original version also fails with linux
But it works for ansi strings only, and for non-ansi (e.g. Simplified Chinese), the original version also fails with linux
Oh, that's unexpected. Our public API should accept any bytes [u8] instead of just UTF-8 or ASCII.
But it works for ansi strings only, and for non-ansi (e.g. Simplified Chinese), the original version also fails with linux
Oh, that's unexpected. Our public API should accept any bytes
[u8]instead of just UTF-8 or ASCII.
Apologize, the Chinese character does work as I passed it to a wrong place.
With Marshal.StringToCoTaskMemUTF8 and Marshal.PtrToStringUTF8 it works on both windows and linux. but I am not sure with performance
code
public class BlockingOperator
{
public IntPtr Op { get; }
public BlockingOperator()
{
var scheme = Marshal.StringToCoTaskMemUTF8("memory");
Op = blocking_operator_construct(scheme);
}
public void Write(string path, string content)
{
var pathPtr = Marshal.StringToCoTaskMemUTF8(path);
var contentPtr = Marshal.StringToCoTaskMemUTF8(content);
blocking_operator_write(Op, pathPtr, contentPtr);
}
public string Read(string path)
{
var pathPtr = Marshal.StringToCoTaskMemUTF8(path);
var ptr = blocking_operator_read(Op, pathPtr);
return Marshal.PtrToStringUTF8(ptr)!;
}
[DllImport(
"opendal_dotnet",
EntryPoint = "blocking_operator_construct",
CallingConvention = CallingConvention.Cdecl,
CharSet = CharSet.Auto)]
private static extern IntPtr blocking_operator_construct(IntPtr scheme);
[DllImport(
"opendal_dotnet",
EntryPoint = "blocking_operator_write",
CallingConvention = CallingConvention.Cdecl,
CharSet = CharSet.Auto)]
private static extern void blocking_operator_write(IntPtr op, IntPtr path, IntPtr content);
[DllImport(
"opendal_dotnet",
EntryPoint = "blocking_operator_read",
CallingConvention = CallingConvention.Cdecl,
CharSet = CharSet.Auto)]
private static extern IntPtr blocking_operator_read(IntPtr op, IntPtr path);
}
full code at https://github.com/fallenwood/incubator-opendal/pull/1 as draft
Marshal.StringToCoTaskMemUTF8 would be more safety for this circumstance
StringToCoTaskMemUTF8 is useful for custom marshaling or for use when mixing managed and unmanaged code. Because this method allocates the unmanaged memory required for a string including a null terminator, always free the memory by calling Marshal.FreeCoTaskMem. This method provides the opposite functionality of Marshal.PtrToStringUTF8. The characters of the string are copied as UTF-8 characters.
PtrToStringUTF8 is useful for custom marshaling or for use when mixing managed and unmanaged code. Because this method creates a copy of the unmanaged string's contents, you must free the original string as appropriate. This method provides the opposite functionality of the Marshal.StringToCoTaskMemUTF8 methods.
With Marshal.StringToCoTaskMemUTF8 and Marshal.PtrToStringUTF8 it works on both windows and linux. but I am not sure with performance
Maybe we should use byte[] in our .NET API instead of string? I don't think it's correct to use UTF8 convert here.
Maybe we should use
byte[]in our .NET API instead ofstring? I don't think it's correct to useUTF8convert he
I think the UTF8 here is enough for .NET binding. Here's the reason:
The core problem here is that we don't have a dot net binding SDK here. So we need to process the DLL call convension. We can take the JNI binding sdk as a example
impl<'local, 'other_local: 'obj_ref, 'obj_ref> JavaStr<'local, 'other_local, 'obj_ref> {
/// Get a pointer to the character array beneath a [JString]
///
/// The string will be `NULL` terminated and encoded as
/// [Modified UTF-8](https://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8) /
/// [CESU-8](https://en.wikipedia.org/wiki/CESU-8).
///
/// The implementation may either create a copy of the character array for
/// the given `String` or it may pin it to avoid it being collected by the
/// garbage collector.
///
/// Returns a tuple with the pointer and the status of whether the implementation
/// created a copy of the underlying character array.
///
/// # Warning
///
/// The caller must release the array when they are done with it via
/// [Self::release_string_utf_chars]
///
/// # Safety
///
/// The caller must guarantee that the Object passed in is an instance of `java.lang.String`,
/// passing in anything else will lead to undefined behaviour (The JNI implementation
/// is likely to crash or abort the process).
unsafe fn -get_string_utf_chars(
env: &JNIEnv<'_>,
obj: &JString<'_>,
) -> Result<(*const c_char, bool)> {
non_null!(obj, "get_string_utf_chars obj argument");
let mut is_copy: jboolean = 0;
let ptr: *const c_char = jni_non_null_call!(
env.get_raw(),
GetStringUTFChars,
obj.as_raw(),
&mut is_copy as *mut _
);
let is_copy = is_copy == JNI_TRUE;
Ok((ptr, is_copy))
}
}
So for me ,the UTF8 LGTM
I think the UTF8 here is enough for .NET binding.
Can you elaborate more about this? We need to support writing data like png, pdf and parquet. Requiring users to put valid utf-8 here doesn't make sense.
Did I misunderstand something here? I also don't understand what's the relationship with java binding or jni.
I think the UTF8 here is enough for .NET binding.
Can you elaborate more about this? We need to support writing data like png, pdf and parquet. Requiring users to put valid utf-8 here doesn't make sense.
Did I misunderstand something here? I also don't understand what's the relationship with java binding or jni.
For this issue, I think we just discuss the behavior of the string in C# or dot net platform. About string behavior, the encoding will affect all the positions that we are using a string like path, content. Which is means that the error will also be raised if the people use the Chinese in the path when we don't encode it. So I recommend UTF8 it in the previous comment.(For personal idea, I think the content is not the root cause for this issue, the string is. If we do not correct it, the dotnet binding will still unuseable in the Windows)
About the future, I agree that we need byte array for content but I think it should be in a split issue or PR that's why I'm not talking about it in this issue or I think keep content as string is enough in this issue (the begin of the issue is that we can not pass the Op = blocking_operator_construct("memory");, I think it's not about content)
@Zheaoli, thank you for your explanation.
I mixed different APIs during the discussion of this issue. Let me make them more clean:
schemeinblocking_operator_construct: must be ascii likes3,memory. No other possible values.pathinblocking_operator_write: must be valid UTF-8 string.contentinblocking_operator_write: must be bytes. (I'm not sure how it represents in .NET)
scheme in blocking_operator_construct: must be ascii like s3, memory. No other possible values. path in blocking_operator_write: must be valid UTF-8 string.
If we want to keep the same behavior for both windows and linux, we need to use 'Marshal.StringToCoTaskMemUTF8' manually. It's the recommended way from official.
content in blocking_operator_write: must be bytes. (I'm not sure how it represents in .NET)
Agree,I will open other issue and take handle of it
If we want to keep the same behavior for both windows and linux, we need to use 'Marshal.StringToCoTaskMemUTF8' manually. It's the recommended way from official.
Okay, I understand: Even if it contains only ASCII characters, we still need to ensure that it's encoded in UTF-8.
Okay, I understand: Even if it contains only ASCII characters, we still need to ensure that it's encoded in UTF-8.
Yes, the encoding in windows world is always hard.
Okay, I understand: Even if it contains only ASCII characters, we still need to ensure that it's encoded in UTF-8.
Yes, the encoding in windows world is always hard.
Just wondering if std::ffi::OsStr would work to simplify this, unfortunately I haven't used it before