Philipp Hack
Philipp Hack
CC @reedwm.
CC @reedwm.
Yes, only the contracting dimensions should have changed.
@frgossen the patterns are FP8 windowed einsums with input dequantizations (i.e. type-conversion to a wider type like FP16 or BF16 and scaling) and all-gathers that have multiple dot users. The...
@frgossen can you PTAL?
@frgossen the motivation for this change is the extension of the existing functionality for collective matmuls with multiple all-gather dots in the windowed einsum handler (see [this comment](https://github.com/openxla/xla/blob/e56c9c1c51cef21cda50eddc45f2a115836a9358/xla/service/gpu/gpu_windowed_einsum_handler.cc#L676)) to support...
@frgossen for some reason I can't directly respond to these two comments: > Do you know why they interfer with the outcome? Is there a fundamental reason or can this...
> If this pass is so tightly coupled with the dot handler, shouldn't there be tests that reflect this? Do you think it makes sense to add test cases that...
> Can you confirm whether I'm correct That's a generally accurate description.