Jan Wassenberg
Jan Wassenberg
Hi, please see #5, does that help?
Very nice, cool to see this building on Windows, thanks for sending the pull request! FYI we are currently working on automated Copybara sync so we can merge hopefully soon.
Thanks for rebasing to dev, that's great! We are now ready to (attempt to) merge. I've resolved conflicts in README.
Interesting, thanks for making us aware. I see that the Highway targets used are AVX3_ZEN4 vs AVX2. The likeliest cause that comes to mind is native bf16 in the former,...
Bummer, thanks for confirming. I also tried with AVX3 (Skylake, so no native bf16) and got the better answer. It's not clear to me at the moment what else it...
Great idea! I very much appreciate you looking into this. To go from T* to float, you can call the following: ``` template HWY_INLINE void Decompress(const CompressedArray& compressed, size_t compressed_ofs,...
An idea: I notice some of the lines in your output file have a low discrepancy, so it's not just a case of accumulating over time. It may be helpful...
An update: even with CoT prompting (append "Think step by step and check your work"), we're currently seeing the incorrect 15 days also with AVX3. I plan to experiment with...
Compensated/cascaded summation turns out not to help because we are already using fp32. We have found and in #194 fixed a bug that may be related, in that behavior changes...
Thanks! Does this also work for clang, or do we require some logic to detect which one to link against? Something like https://github.com/ned14/llfio/issues/52 ?