Rob Suderman comments

Results 47 comments of


                                            Rob Suderman

CUDA backend Conv2d wrong result for specific parameter constellations

> Based on the description, I'm going to start with the assumption that this is an issue with the torch to linalg conversions. Asking Rob to weigh in. So the...

[stablehlo] Add a pass to force scatters to legal areas

> Do we know how terrible of code this generates? This is a lot of stuff to do per element and I suspect it may break vectorization and such -...

[stablehlo] Add a pass to force scatters to legal areas

> sounds good - @MaheshRavishankar may be able to speak to the formation questions - I mostly am just curious if we're sending all scatters down a scalar path with...

[stablehlo] Add a pass to force scatters to legal areas

> yeah, that reduction is likely to be an issue - we don't really want to serialize everything like that - is there a reason to reduce? > > (I'd...

bf16 result mismatch for Conv2D op

> @rsuderman , here is the much more concise and optimized way that the PyTorch runtime does it (I think that part was written by Marat and carried over from...

Add github job for checking code formatting

Could you separate adding the formatting work / job from the actual reformatting of the code. It should help avoid conflating functional changes from purely formatting changes.

Serialize Executables crashing when compiling LLaMa on async-cpu

It appears the issue is in `LLVMCPUVectorTransferLowering`. There is a full unrolling making the dispatch rather unruly.

Serialize Executables crashing when compiling LLaMa on async-cpu

Some additional guilty lines: ``` %12 = vector.transfer_read %10[%c0, %c0], %cst_2 {in_bounds = [true, true]} : tensor, vector %13 = arith.extf %12 : vector to vector %14 = vector.transfer_write %13,...

Serialize Executables crashing when compiling LLaMa on async-cpu

This issue is blocking another model on the `onnx` front.

Expand `arith.minf` and `arith.maxf` expand ops for non-`f32` types

Validated that this is still failing.