spatial-lang
spatial-lang copied to clipboard
"Concurrent" Writes from Unrolled Access
In SimpleTileLoadStore, I had the following:
Accel{
val b1 = SRAM
Foreach(N by 1 par 2) { i =>
b1(i) = // something
}
}
`
When this unrolled, it gave me two duplicates of b1, and each of the new Foreaches that got unrolled was writing to both of the duplicates. I know the "correct" thing to do is to declare b1 inside the pipe to be unrolled, but I'm not sure what should happen if it is written as I had it originally.
- It should create 2 duplicates of b1 and expect both writers to b1 to be active at the same time on both duplicates, since their common parent is a parallel?
or
- It should crash and throw error to user
The problem with 1 is that right now, I assume that we will never have two writers to a memory active at the same time. The memory only has one write port per bank per buffer, and I use the write enable of a port as the mux select. The write enable is loosely equal to the parent's enable. I think I can fix this by changing around how my muxes talk to my srams, but I want to make sure such an app is considered valid before I go messing around with these signals, since it will get complicated very quickly
There's something seriously wrong with banking analysis if this is producing two duplicates of b1.
Is there another loop here that you didn't mention?
Accel {
val b1 = SRAM[T](tileSize)
Sequential.Foreach(N by tileSize par 2) { i =>
b1 load srcFPGA(i::i+tileSize par 16)
val b2 = SRAM[T](tileSize)
Foreach(tileSize by 1) { ii =>
b2(ii) = b1(ii) * x
}
dstFPGA(i::i+tileSize par 16) store b2
}
}
swapping the b1 declaration with the Sequential.foreach fixes the issues. Not sure if I made that clear
Option 2 is the only correct thing to do here, we just need to be able to catch this case
Edit: Actually I need to come back and think about this more. My brain is just about fried from making slides.
Ok, so creating two duplicates is only the correct thing to do if we can guarantee that writes from the first iteration aren't visible in the second iteration, and so on. This may actually be a bit difficult to figure out in general.
Option 2 may be the best for now until we have a solution for "killing writes". Putting the SRAM within the outer loop is much more explicit about what the expected behavior is, so it seems like requiring that isn't too bad.
Added something to give errors in this case - we can leave it open for now to determine if we ever want to implement option 1.