migrate to jdk 22 and fix upcalls
Hi, in this pull request I propose migrating the project to jdk22 support.
I also managed to track down the reason the upcall test failed. After adding tests for other upcalls i noticed, it only failed for jvm functions that return strings.
I eventually inserted a few debug prints into the generated bytecode of upcall-class and downcall-class and saw this:
("downcall-fn called")
hello from downcall-class/invoke
to prim-asm of supposed type [:coffi.ffi/fn [] :coffi.mem/c-string]. actual class of value:
jdk.internal.foreign.NativeMemorySegmentImpl
hello from downcall-class before invokeExact
hello from upcall-class/upcall
hello from upcall-class right before the invoke
hello from upcall-class right after the invoke
to prim-asm of supposed return type :coffi.mem/c-string. actual class of value:
java.lang.Long
java.lang.ClassCastException: class java.lang.Long cannot be cast to class java.lang.foreign.MemorySegment (java.lang.Long and java.lang.foreign.MemorySegment are in module java.base of loader 'bootstrap')
Turns out the downcall function that is wrapped by downcall-class correctly handles the function argument as a MemorySegment (thus not necessitating any to-prim-asm conversion), but the function wrapped by upcall-class returns a Long for the ret-type ::mem/c-string. I'm not entirely sure if that's the case for any value that has primitive-type of ::mem/pointer but I suspect that's the case.
Because of that a simple change to to-prim-asm didn't work since it would have to behave differently for upcall-class and downcall-class.
The solution I've settled on in this pull request is to still let upcall-class/upcall use the to-prim-asm instructions but also for specifically pointer types call MemorySegment.ofAddress. There is a downside to doing that though. Since this is a default static method of an Interface, calling it from bytecode necessitates a higher bytecode version than is default. Version 8 is fine, but it needs to be specified on the generated class. I've also bumped the insn version to 0.5.4 which allows emitting more recent jvm bytecode versions should that be necessary. In any case, with that change we can invokestatic with the extra argument true, which sets ASMs itf flag so we can specify that we're talking about an InterfaceMethodref and successfully call the interface method directly from bytecode.
I'm not exactly sure if the function wrapped by upcall-class shouldn't just return a MemorySegment in the first place instead of a Long, but i wasn't sure how to robustly integrate this.
The reason it does so, is because of how serialize is implemented for c-string. In migrating to the jdk22 API, I kept the old behavior, which simply returns the address of the string as a Long. I'm not sure if other parts of the library depend on that specific behavior, so the additional call to MemorySegment.ofAddress seemed like the pragmatic choice to me.
Any feedback on this?
(this also addresses what was discussed in #8)
in it's current state, the tests pass on JDK22 and the library can be loaded into another project, but when I try to actually call a function defined via defcfn, a ClassCastException is being thrown, trying to convert a Long into a AbstractMemorySegmentImpl.
Hey, this is absolutely fantastic! I really appreciate the in-depth discussion of the issues, the process you went through, and the decision points we have left.
You've caught me right in the last week leading up to moving across the country, so I don't have time to get into the details to help decide exactly what to do about the c-string return type just now, but in about a week and a half I'll have some time between arriving at the new house and having my stuff arrive, and I can go into detail then.
I will say that I don't think just returning a Long is what's intended there, it's supposed to call .getUtf8String on the returned MemoryAddress impl back when that existed, but it looks like when I was just getting the types to be happy when doing the initial migration to the JDK 20 api I didn't fully work that out when they changed how addresses are returned.
I suspect this might be the case for any situation where a pointer is being returned by a coffi type that tries to read it as a memory segment, since I think during that conversion I missed that it returns a Long instead of a MemorySegment in that case.
Ok, I changed the serialization of strings to consistently produce MemorySegments instead of their addresses, so the Long to MemorySegment or reverse problem of that should not occur again.
With that I also reverted the ad-hoc upcall-class Long to MemorySegment conversion if the type is a pointer. to-prim-asm should now, as before, be enough conversion for pointer types. No special case needed for c-string.
I also added 2 more test cases that helped me track down the incompatibilities and checked this against my raylib-clj library which now works using this version of coffi under jdk22.
Oh, one small thing about the build script: I didn't commit this, but clang is behaving in kind of an annoying way for testing this cross platform. I replaced clang in the build script with zig as a drop-in replacement using zig cc to produce a library on windows and macos with the same command. It's a small thing, but I wanted to mention it. I don't want to bloat this PR too much, and could open a new one, if this is a change that is acceptable.
This looks fantastic!
I've gotten pretty well settled at my new home, and so I think I can review this, merge it, and get a new release out fairly shortly.
I was able to do some testing this evening and got my glfw-clj tests working as well with your work, including upcalls. I'm very happy with how this looks. I'll put some time in tomorrow to look at the total diff you have now after the changes, and will be ready to merge this and cut a release soon!
Thanks for the quick reply and it was a pleasure to contribute :)
Now I'm in the mood to do more, tbh, haha. If there is a list of features or bugs you would welcome PRs on, i'd be willing to have a look if i can spare the time.
@rutenkolk I'm getting ready for a 1.0 release since I think coffi is about ready for that. I don't think there's anything more really to be done before then.
That said, if you're willing to get into the nitty-gritty like it seems you have been so far, I would love contributions for creating more defX macros for supported coffi types. Specifically, a defstruct or defenum macro would be nice.
I haven't created them yet because they'll require a decent bit of work. I'm about halfway there because I've defined the struct and enum typespecs, but I don't want these macros to be wrappers around defalias since that already exists.
I want a defstruct style macro to create unrolled calls to the write-X and read-X functions so that coffi's performance marshaling structs can be competitive with the performance for primitives.
If you are able to get through those, or find anything else you'd like to contribute, but are hungry for more, I can put some effort into making a project board or creating some issues for features I'd like to see.
fantastic, I'll think about it!
i don't know where to put this but i want to just say that i'm currently effectively writing a defstruct macro.
it still relies on defalias for the moment. i want to actually create classes to make marshalling fast but for convenience i think having the ability to pass in a vector or a map is something desireable. this would mean it requires some overhead in choosing the right serialization function, but hopefully using a protocol is fast enough. maybe exposing the monomorphized serialization functions themselves might be something appealing to get the last drop of performance if one wants it though. obviously this requires some benchmarking
for now i generate stuff like this:
(coffi.mem/defalias
:raylib/Font
[:coffi.mem/struct
[[:baseSize :coffi.mem/int]
[:glyphCount :coffi.mem/int]
[:glyphPadding :coffi.mem/int]
[:texture :raylib/Texture]
[:recs :coffi.mem/pointer]
[:glyphs :coffi.mem/pointer]]])
(clojure.core/defprotocol
proto-serialize-Font
(serialize-Font [obj segment]))
(clojure.core/extend-protocol
proto-serialize-Font
clojure.lang.IPersistentVector
(serialize-Font
[obj segment]
(do
(coffi.mem/write-int segment 0 (vector-nth obj (clojure.core/int 0)))
(coffi.mem/write-int segment 4 (vector-nth obj (clojure.core/int 1)))
(coffi.mem/write-int segment 8 (vector-nth obj (clojure.core/int 2)))
(serialize-Texture
(vector-nth obj (clojure.core/int 3))
(coffi.mem/slice segment 12 20))
(coffi.mem/write-address segment 32 (vector-nth obj (clojure.core/int 4)))
(coffi.mem/write-address
segment
40
(vector-nth obj (clojure.core/int 5)))))
clojure.lang.IPersistentMap
(serialize-Font
[obj segment]
(do
(coffi.mem/write-int segment 0 (:baseSize obj))
(coffi.mem/write-int segment 4 (:glyphCount obj))
(coffi.mem/write-int segment 8 (:glyphPadding obj))
(serialize-Texture (:texture obj) (coffi.mem/slice segment 12 20))
(coffi.mem/write-address segment 32 (:recs obj))
(coffi.mem/write-address segment 40 (:glyphs obj)))))
(clojure.core/defmethod
coffi.mem/serialize-into
:raylib/Font
[obj _struct segment _session]
(serialize-Font obj segment))