riscv-perf-model icon indicating copy to clipboard operation
riscv-perf-model copied to clipboard

Vector Permutation Design Document

Open govardhnn opened this issue 8 months ago • 9 comments

govardhnn avatar Apr 06 '25 07:04 govardhnn

@govardhnn your design doc is missing the critical information that I'm looking for which is "what uops are generated for each type of permutation instruction?". It would be great to include a simple example of uop generation for each type of uop generator that you plan to add.

kathlenemagnus-mips avatar May 19 '25 16:05 kathlenemagnus-mips

From today's meeting (2nd June, 2025), the following is the link to my slides with the initial block diagram for the vector slide instructions. Link: Olympia: Vector Permutation Design Proposal https://docs.google.com/presentation/d/1JPNQCGP9xFT4H0yEiLLE2OtRa35D_gz6fdhwKsWmmy0/edit?usp=sharing

The reference uArch for the other vcompress and vgather that I presented today is linked below: [Efficient Implementation of RISC-V Vector Permutation Instructions https://arxiv.org/abs/2505.07112 arXiv:2505.07112 https://arxiv.org/abs/2505.07112] - and will be cited in docs/vector_permutation.adoc

The vector permutation design document PR#251 will also soon be extended based on today's review feedback.

Thanks, Govardhan

On Mon, May 19, 2025 at 9:47 PM Kathlene Magnus @.***> wrote:

@.**** commented on this pull request.

In arches/isa_json/olympia_uarch_rv64v.json https://github.com/riscv-software-src/riscv-perf-model/pull/251#discussion_r2096087776 :

@@ -104,7 +104,7 @@ { "mnemonic": "vcompress.vm", "pipe": "vpermute",

  •    "uop_gen": "PERMUTE",
    
  •    "uop_gen": "COMPRESS",
    

This file is generated by gen_uarch_rv64v_json.py so it shouldn't be updated directly. You can modify the Python script and then run it to generate this file.

— Reply to this email directly, view it on GitHub https://github.com/riscv-software-src/riscv-perf-model/pull/251#pullrequestreview-2851361133, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI2SKHTVJLPP3PJAAAOTUEL27H7ZBAVCNFSM6AAAAAB2RNA4VSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDQNJRGM3DCMJTGM . You are receiving this because you were mentioned.Message ID: @.*** com>

-- Sai Govardhan InCore Semiconductors

govardhnn avatar Jun 02 '25 18:06 govardhnn

Hey @govardhnn where are you with this PR? Specifically, did you address @kathlenemagnus requests? Also, can you get regression to pass again?

klingaard avatar Jul 13 '25 15:07 klingaard

Hi @klingaard and @kathlenemagnus-mips I have been on a personal break for a while - I should have informed the team earlier, apologies.. Can I get back to this by the end of the month? Or please feel free to take this over if the feature is on the critical path.

Thanks,

govardhnn avatar Jul 19 '25 16:07 govardhnn

Hey @govardhnn, yes, please feel free to take the time you need. We do appreciate the contributions that folks make to the model.

klingaard avatar Jul 27 '25 19:07 klingaard

Hi @klingaard, I will have to discontinue this submission since I have joined a startup in stealth recently. I will be unable to contribute to open source for sometime - and intend to come back in full force once I have the permissions to do so.

I will be happy if any other volunteers would like to build on this vector permutation proposal.

Thanks, Govardhan

govardhnn avatar Sep 30 '25 17:09 govardhnn

Understood and we do appreciate the contributions you've made! We can take it from here. Best of luck in your endeavors.

klingaard avatar Oct 04 '25 12:10 klingaard

Thanks @klingaard!

govardhnn avatar Oct 05 '25 07:10 govardhnn

@govardhnn This work looks incomplete to me. I would expect this PR to contain modifications to Execute since the uops generated cannot be executed independently of each other.

For example, with your vrgather example in your doc:

UOP 1: vrgather.vv v20, v8, v4   # Process first register group
UOP 2: vrgather.vv v21, v9, v5   # Process second register group
UOP 3: vrgather.vv v22, v10, v6  # Process third register group
UOP 4: vrgather.vv v23, v11, v7  # Process fourth register group

It's possible that the indexes specified in v4 will need to gather elements from multiple vs2 registers (v8-v11) to write the result for v20. UOP 1 on its own will not read the right source registers to be able to write the correct value to v20. The result is that v20 will be marked ready earlier than is functionally possible. I see a similar issue with some of the vslide instructions.

Did you have more work planned for this project that you did not get to? If so, please document it so someone else can continue this work.

kathlenemagnus avatar Oct 13 '25 22:10 kathlenemagnus