sanchitintel
sanchitintel
There’s only one oneDNN Graph MHA pattern that corresponds to generic MHA (without any permute, reshape & contiguous). But that pattern of `matmul -> scale (optional) -> attention_mask (optional) ->...
Will reopen when the next version of oneDNN will be integrated with PyTorch, when aligning the implementation with Jason's advice would be possible. Thanks!
> BTW: I think horizontal transverse doesn't work well with this cache optimization cc @jgong5 @chunyuan-w Hi, would the horizontal traverse strategy complement the existing AMX GEMM micro-kernel template (by...
> makes the logic limited to handle 16, 32 and 48 Can we also add a note on how/why a particular set of `[block_m, block_n, block_k]` values were chosen for...
@pytorchbot rebase -b main
@pytorchbot merge
#1220 will fix this issue.
Thanks for pointing that out, @matthewdouglas! I've revised the description. @jianan-gu @xia-weiwen, please clarify if you had added `AdamW8bit` implementation for CPU to `bitsandbytes`. If not, do you have plans...
This current release is for Discrete Graphics cards. While it only mentions `Flex Series 170 GPU`, it also supports the Intel Arc Alchemist series GPUs. `Intel Extension for PyTorch` is...
Thanks for your interest in `Intel Extension for PyTorch`, @tedliosu! We look forward to your response! As @jingxu10 also mentioned, the current `whl`s are for Flex Series 170 GPUs (which...