riscv-perf-model
riscv-perf-model copied to clipboard
Vector Permutation Design Document
@govardhnn your design doc is missing the critical information that I'm looking for which is "what uops are generated for each type of permutation instruction?". It would be great to include a simple example of uop generation for each type of uop generator that you plan to add.
From today's meeting (2nd June, 2025), the following is the link to my slides with the initial block diagram for the vector slide instructions. Link: Olympia: Vector Permutation Design Proposal https://docs.google.com/presentation/d/1JPNQCGP9xFT4H0yEiLLE2OtRa35D_gz6fdhwKsWmmy0/edit?usp=sharing
The reference uArch for the other vcompress and vgather
that I presented today is linked below:
[Efficient Implementation of RISC-V Vector Permutation Instructions
https://arxiv.org/abs/2505.07112
arXiv:2505.07112 https://arxiv.org/abs/2505.07112] - and will be cited in
docs/vector_permutation.adoc
The vector permutation design document PR#251 will also soon be extended based on today's review feedback.
Thanks, Govardhan
On Mon, May 19, 2025 at 9:47 PM Kathlene Magnus @.***> wrote:
@.**** commented on this pull request.
In arches/isa_json/olympia_uarch_rv64v.json https://github.com/riscv-software-src/riscv-perf-model/pull/251#discussion_r2096087776 :
@@ -104,7 +104,7 @@ { "mnemonic": "vcompress.vm", "pipe": "vpermute",
"uop_gen": "PERMUTE",
"uop_gen": "COMPRESS",This file is generated by gen_uarch_rv64v_json.py so it shouldn't be updated directly. You can modify the Python script and then run it to generate this file.
— Reply to this email directly, view it on GitHub https://github.com/riscv-software-src/riscv-perf-model/pull/251#pullrequestreview-2851361133, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI2SKHTVJLPP3PJAAAOTUEL27H7ZBAVCNFSM6AAAAAB2RNA4VSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDQNJRGM3DCMJTGM . You are receiving this because you were mentioned.Message ID: @.*** com>
-- Sai Govardhan InCore Semiconductors
Hey @govardhnn where are you with this PR? Specifically, did you address @kathlenemagnus requests? Also, can you get regression to pass again?
Hi @klingaard and @kathlenemagnus-mips I have been on a personal break for a while - I should have informed the team earlier, apologies.. Can I get back to this by the end of the month? Or please feel free to take this over if the feature is on the critical path.
Thanks,
Hey @govardhnn, yes, please feel free to take the time you need. We do appreciate the contributions that folks make to the model.
Hi @klingaard, I will have to discontinue this submission since I have joined a startup in stealth recently. I will be unable to contribute to open source for sometime - and intend to come back in full force once I have the permissions to do so.
I will be happy if any other volunteers would like to build on this vector permutation proposal.
Thanks, Govardhan
Understood and we do appreciate the contributions you've made! We can take it from here. Best of luck in your endeavors.
Thanks @klingaard!
@govardhnn This work looks incomplete to me. I would expect this PR to contain modifications to Execute since the uops generated cannot be executed independently of each other.
For example, with your vrgather example in your doc:
UOP 1: vrgather.vv v20, v8, v4 # Process first register group
UOP 2: vrgather.vv v21, v9, v5 # Process second register group
UOP 3: vrgather.vv v22, v10, v6 # Process third register group
UOP 4: vrgather.vv v23, v11, v7 # Process fourth register group
It's possible that the indexes specified in v4 will need to gather elements from multiple vs2 registers (v8-v11) to write the result for v20. UOP 1 on its own will not read the right source registers to be able to write the correct value to v20. The result is that v20 will be marked ready earlier than is functionally possible. I see a similar issue with some of the vslide instructions.
Did you have more work planned for this project that you did not get to? If so, please document it so someone else can continue this work.