sanchitintel comments

Results 59 comments of


                                            sanchitintel

Integrate Inductor with oneDNN Graph int8 fusions for CPU

There’s only one oneDNN Graph MHA pattern that corresponds to generic MHA (without any permute, reshape & contiguous). But that pattern of `matmul -> scale (optional) -> attention_mask (optional) ->...

Integrate Inductor with oneDNN Graph int8 fusions for CPU

Will reopen when the next version of oneDNN will be integrated with PyTorch, when aligning the implementation with Jason's advice would be possible. Thanks!

Simplify & rectify dequantized B buffer loading for AMX GEMM micro-kernel for WoQ int8 case

> BTW: I think horizontal transverse doesn't work well with this cache optimization cc @jgong5 @chunyuan-w Hi, would the horizontal traverse strategy complement the existing AMX GEMM micro-kernel template (by...

Simplify & rectify dequantized B buffer loading for AMX GEMM micro-kernel for WoQ int8 case

> makes the logic limited to handle 16, 32 and 48 Can we also add a note on how/why a particular set of `[block_m, block_n, block_k]` values were chosen for...

Simplify & rectify dequantized B buffer loading for AMX GEMM micro-kernel for WoQ int8 case

@pytorchbot rebase -b main

Simplify & rectify dequantized B buffer loading for AMX GEMM micro-kernel for WoQ int8 case

@pytorchbot merge

Request for AdamW8bit support on CPU (would help TorchTune)

#1220 will fix this issue.

Request for AdamW8bit support on CPU (would help TorchTune)

Thanks for pointing that out, @matthewdouglas! I've revised the description. @jianan-gu @xia-weiwen, please clarify if you had added `AdamW8bit` implementation for CPU to `bitsandbytes`. If not, do you have plans...

F32 Example Training Gets Stuck after One Iteration of For Loop

This current release is for Discrete Graphics cards. While it only mentions `Flex Series 170 GPU`, it also supports the Intel Arc Alchemist series GPUs. `Intel Extension for PyTorch` is...

F32 Example Training Gets Stuck after One Iteration of For Loop

Thanks for your interest in `Intel Extension for PyTorch`, @tedliosu! We look forward to your response! As @jingxu10 also mentioned, the current `whl`s are for Flex Series 170 GPUs (which...