spatial-lang icon indicating copy to clipboard operation
spatial-lang copied to clipboard

"Concurrent" Writes from Unrolled Access

Open mattfel1 opened this issue 7 years ago • 6 comments

In SimpleTileLoadStore, I had the following:

Accel{ val b1 = SRAM Foreach(N by 1 par 2) { i => b1(i) = // something } } `

When this unrolled, it gave me two duplicates of b1, and each of the new Foreaches that got unrolled was writing to both of the duplicates. I know the "correct" thing to do is to declare b1 inside the pipe to be unrolled, but I'm not sure what should happen if it is written as I had it originally.

  1. It should create 2 duplicates of b1 and expect both writers to b1 to be active at the same time on both duplicates, since their common parent is a parallel?

or

  1. It should crash and throw error to user

The problem with 1 is that right now, I assume that we will never have two writers to a memory active at the same time. The memory only has one write port per bank per buffer, and I use the write enable of a port as the mux select. The write enable is loosely equal to the parent's enable. I think I can fix this by changing around how my muxes talk to my srams, but I want to make sure such an app is considered valid before I go messing around with these signals, since it will get complicated very quickly

mattfel1 avatar Apr 10 '17 08:04 mattfel1

There's something seriously wrong with banking analysis if this is producing two duplicates of b1.

Is there another loop here that you didn't mention?

dkoeplin avatar Apr 10 '17 08:04 dkoeplin

Accel {
  val b1 = SRAM[T](tileSize)
  Sequential.Foreach(N by tileSize par 2) { i =>
    b1 load srcFPGA(i::i+tileSize par 16)

    val b2 = SRAM[T](tileSize)
    
    Foreach(tileSize by 1) { ii =>
      b2(ii) = b1(ii) * x
    }
    dstFPGA(i::i+tileSize par 16) store b2
  }
}

mattfel1 avatar Apr 10 '17 08:04 mattfel1

swapping the b1 declaration with the Sequential.foreach fixes the issues. Not sure if I made that clear

mattfel1 avatar Apr 10 '17 08:04 mattfel1

Option 2 is the only correct thing to do here, we just need to be able to catch this case

Edit: Actually I need to come back and think about this more. My brain is just about fried from making slides.

dkoeplin avatar Apr 10 '17 08:04 dkoeplin

Ok, so creating two duplicates is only the correct thing to do if we can guarantee that writes from the first iteration aren't visible in the second iteration, and so on. This may actually be a bit difficult to figure out in general.

Option 2 may be the best for now until we have a solution for "killing writes". Putting the SRAM within the outer loop is much more explicit about what the expected behavior is, so it seems like requiring that isn't too bad.

dkoeplin avatar Apr 10 '17 17:04 dkoeplin

Added something to give errors in this case - we can leave it open for now to determine if we ever want to implement option 1.

dkoeplin avatar Jul 10 '17 04:07 dkoeplin