Andrzej Janik
Andrzej Janik
Wrt. performance. If compute capability is not enough information then ZLUDA could add a CUDA extension to surface whatever llama.cpp needs with the simplest bit being the underlying HIP device...
Hmmm, I assumed this: "tile sizes are fixed for a given architecture, llama.cpp compiles several variants for whatever architectures were chosen at compile time and the during run time llama.cpp...
That's inconvinient. Because I was thinking that ZLUDA could pick appropriate optimal-CC module from fatbin (setting aside the mechanism for it). For example, I'm looking at `ggml_mul_mat_q4_0_q8_1_cuda`. I can see...
Yes, exactly this. Now I'll see what can be done on the ZLUDA side
Yes, I've ran a scan on my machine and it came back clean. Detections with `!ml` suffix come from Defender machine learning mode. It looks to me like a case...
It's not impossible (as evidenced by an earlier ZLUDA version), but extremely unlikely. You'd need an organization that sponsors the development. It's probably not going to be Intel, because realistically,...
I'm closing this as duplicate. Feel free to continue discussion in #85
I did not take notes last time I poked there, so it might not be 100% accurate, but: - Don't use a full game, rather start with DLSS 2 sample...
Best to decide depending on what sort of LLVM bitcode do we need to emit for it. `as::CvtDetails` already splits conversion case into separate variants (IntFromInt, FloatFromFloat, IntFromFloat, FloatFromInt) which...
If ZLUDA development continues then that will be on the todo list. For now, ZLUDA is written to make sure that it's "non-invasive" - an application runs ZLUDA only if...