riscv-perf-model icon indicating copy to clipboard operation
riscv-perf-model copied to clipboard

Multiple LSU pipeline support

Open MayneMei opened this issue 1 year ago • 7 comments

Hi Arup and Knute, after I talked with Arup, I decided to implement store buffer first to enable data forwarding and then implement multi pipelines. Here is the draft pull. For my implementation, 30% tests passed, 69 tests failed out of 99, when I look those results I feel really lost on how to begin debug. Could you give me some guidance? Thank you!

MayneMei avatar Nov 01 '24 17:11 MayneMei

@MayneMei Please reach out to Knute and see if he is available for a meeting to address your questions about debugging.

arupc avatar Nov 10 '24 10:11 arupc

@MayneMei I fixed the macOS build.

BTW I am going to release my VLSU branch, which may conflict with some of your LSU changes. Let me know if you have any trouble updating your branch.

kathlenemagnus avatar Nov 25 '24 21:11 kathlenemagnus

When I added print message inside the LSU.cpp, I can see the store buffer size was modified. But in the test it's always 0. However, the cycle time met the expectation of data forwarding, so I assume it's correct

MayneMei avatar Dec 06 '24 08:12 MayneMei

@MayneMei my recent PR introduces some merge conflicts for you. If you have any trouble with the merge, please let me know and I can help resolve it.

kathlenemagnus avatar Dec 17 '24 20:12 kathlenemagnus

@kathlenemagnus I resolved conflict. However during the regression test, there is an environment issue related to MacOS, and also test "BranchPred_test_Run" introduced a segfault. Other tests are all passed. Do you mind take a look at it? Thank you!

MayneMei avatar Jan 24 '25 18:01 MayneMei

@MayneMei it looks like you rebased your branch incorrectly. The PR contains all of the fixes that are already in master -- perhaps you rebased all of the updates since you pulled last?

klingaard avatar Mar 23 '25 13:03 klingaard

@klingaard I reseted the commit and rebased it.

MayneMei avatar Mar 23 '25 19:03 MayneMei

Thanks for the suggestion! I should have a following PR in next week.

Knute Lingaard @.***>于2025年5月26日 周一12:37写道:

@.**** approved this pull request.

Thanks for making the proper fix. We can move forward with this, but I do have a suggestion for improving the performance/readability of it. Feel free to merge and make a subsequent PR to change the alg if you want.

In core/lsu/LSU.cpp https://github.com/riscv-software-src/riscv-perf-model/pull/216#discussion_r2107781558 :

  •    std::vector<bool> coverage_mask(load_size, false);
    
  •    uint32_t bytes_covered_count = 0;
    
  •    // Iterate through the store_buffer_ from youngest to oldest.
    
  •    // This ensures that if multiple stores write to the same byte,
    
  •    // the data from the youngest store is effectively used.
    
  •    for (auto it = store_buffer_.rbegin(); it != store_buffer_.rend(); ++it)
    
  •    {
    
  •        const auto& store_info_ptr = *it; // LoadStoreInstInfoPtr
    
  •        const InstPtr& store_inst_ptr = store_info_ptr->getInstPtr();
    
  •        const uint64_t store_addr = store_inst_ptr->getTargetVAddr();
    
  •        const uint32_t store_size = store_inst_ptr->getMemAccessSize();
    
  •        if (store_size == 0) {
    
  •            continue; // Skip stores that don't actually write data.
    

Which ones are those?

In core/lsu/LSU.cpp https://github.com/riscv-software-src/riscv-perf-model/pull/216#discussion_r2107793832 :

  •    store_buffer_.push_back(store_info_ptr);
    
  •    ILOG("Store added to store buffer: " << inst_ptr);
    
  • }
  • bool LSU::tryStoreToLoadForwarding(const InstPtr& load_inst_ptr) const
  • {
  •    const uint64_t load_addr = load_inst_ptr->getTargetVAddr();
    
  •    const uint32_t load_size = load_inst_ptr->getMemAccessSize();
    
  •    // A load must have a non-zero size to access memory.
    
  •    if (load_size == 0) {
    
  •        return false;
    
  •    }
    
  •    std::vector<bool> coverage_mask(load_size, false);
    

This defaults to a specialization of std::vector which is really a dynamic std::bitset. Suggest using a boost::dynamic_bitset for the implementation. See my next points...

Since we don't have a concept of a store buffer class, which could maintain the valid bits per cache line, you can kinda simulate that here.

  1. First, you can make a rule that store to load forwarding cannot cross a cache line. Makes things a little easier.

  2. If the load addr + size is good create a bitmask of the bytes that are needed by the load based on the size of the cache line

  3. Iterate the store buffer like you're doing

  4. Create a bitmask for the store (like you did for the load) and clear the bits in the load bit mask that match:

    load_bytes_needed &= ~store_bytes; if (load_bytes_needed.none()) { return true; // we found all the stores that can forward to this load within the cache line }

— Reply to this email directly, view it on GitHub https://github.com/riscv-software-src/riscv-perf-model/pull/216#pullrequestreview-2869063971, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQSAIYBUXLDB7WPTNGHL3M33ANUPLAVCNFSM6AAAAABRAUMZJKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDQNRZGA3DGOJXGE . You are receiving this because you were mentioned.Message ID: @.*** com>

MayneMei avatar May 26 '25 19:05 MayneMei

@MayneMei do you have any updates to push?

knute-mips avatar Jul 07 '25 16:07 knute-mips

We'll go ahead and merge this. There are a few folks comping at the bit to enhance this block and are waiting for this PR.

klingaard avatar Aug 01 '25 13:08 klingaard