Halide
Halide copied to clipboard
Fix #5175: Add an extra check for input dependencies of the async producer
As described in the #5175, it's possible to write an illegal schedule which successfully goes through the code generation, but generated code produces incorrect results.
This PR adds an extra check to detect such illegal schedules by verifying that if in realize node A there is a producer node B and producer A is async and is consumer of B then B must be also scheduled as async.
LGTM but @zvookin should probably review
There's something I'm not getting here. Generally the things an async scheduled Func depends on do not need to be async themselves. I'm guessing this has to do with the storage being allocated outside the async invocations and the compute being inside. But then I'm not quite seeing how requiring the dependency to be async always makes it correct. I will sit down and look at it a little more closely, but I think updating the description to say something in terms of which schedules are valid and invalid, rather than describing the IR conditions they result in, might help.
Thank you for reviewing, that's a great question.
Generally the things an async scheduled Func depends on do not need to be async themselves.
Right, but input dependency need to be computed fully beforehand. If input dependency is still being produced, async_producer needs to have a way to know that data it depends on is ready.
If I understand correctly, async producers are lifted into separate "thread" based on their store_level, so if store_level is the same as compute_level then there is sort of implicit synchronization point between input dependency and producer:
for some_dim:
compute input_dependency
--> implicit sync point <--
fork {
compute async_producer (depends on input_dependency)
semaphores to sync between async_producer and consumer
} {
semaphores to sync between async_producer and consumer
compute consumer of async_producer
}
However, if store level is different from compute level then async producer will be lifted out of the loop and this implicit synchronization point is gone:
fork {
for some_dim:
compute async_producer (depends on input_dependency, but can't really know how much of it was produced in another thread)
semaphores to sync between async_producer and consumer
} {
for some_dim:
compute input_dependency
semaphores to sync between async_producer and consumer
compute consumer of async_producer
}
If this logic is correct then solution can be either making a copy of input_producer in both "threads" or inserting explicit synchronization between input_producer and async_producer. The latter looks exactly like what async() already does, so it seemed to me like a more reasonable way to address it.
(Maybe, it's also possible to do forking differently, so it's not based on store level, but I am not sure and it sounds like a really big change).
Monday review ping.
Is this still just awating review?
Is this just waiting on review?
Yes, review is still needed.
Also, I recently realized that #5201 might be just a different type of the same issue, so maybe it's possible/worth trying to generalize check from this PR to detect the case from #5201.
Where does this PR stand?
I can try to generalize it to cover the other case, but it would be great to get a feedback first in case there is a flaw in the logic somewhere.
This PR is pretty old. Is it still active?
This is still a problem which needs fixing, so yes. That being said, I discovered a test which still fails, so this needs to be figured out first. I am planning to return to async in couple weeks for some internal stuff, so hopefully can get this fixed too then.
This is still a problem which needs fixing, so yes. That being said, I discovered a test which still fails, so this needs to be figured out first. I am planning to return to async in couple weeks for some internal stuff, so hopefully can get this fixed too then.
What was the test that still fails? Is it part of this PR?