Romain Dolbeau

Results 22 comments of Romain Dolbeau

Did a quick test (performance only, might be buggy) using (V)PMOVZXDQ. Both the SSE4.1 version (that you can see on https://github.com/rdolbeau/zfs/tree/test_fletcher_sse41) and the AVX (three-operands VEX-encoded, otherwise identical) versions are...

@InsanePrawn Thanks, Bloomfield is Nehalem and was a potential target, but the number are also not good. A bit surprised that sse4_1/byteswap is so low, but overall I didn't expect...

@slavonnet Not sure what benefit you expect from AVX code in the RAID-Z computations over SSSE3; I don't think there's instructions that would help, so the gain would be just...

@bunnie They are huge, unfortunately. From my notes, I have a synthesis at 13263/9099 Slice LUTs/Slice Registers before, latest is 18277/11003 (and the rest of the design should be almost...

Thanks for the answer. I tried integrating the existing code directly, but it requires a lot of support files (with some minor word conflicts) and eventually choked on missing push-package/pop-package...

I think part of the problem is that I was trying to include the OFW code in mine, which I am tokenizing with fcode-utils `toke`. But it predefines a lot...

@MitchBradley Dumb question; is it possible to find the byte-code value(s) for a word inside the PROM monitor? I can use 'see' to see the definition, but it doesn't give...

@subhajit26 The proposed "P" (packed simd) extension to RISC-V is different in this regard to most other SIMD instruction sets; it does not use specific registers (such as the XMM...

@subhajit26 TL;DR: yes :-) Instructions like `kmxda` are indeed made for complex arithmetic. But you need to know how the data are organized in your `i16_complex_t̀` type. Assuming it's a...

@subhajit26 Can't help with spike. For 32-bits data manipulation, you may want to look at the [B extension (bitmanip)](https://github.com/riscv/riscv-bitmanip) instead of P, it has 32-bits MAX[U]/MIN[U] for instance.