Vivian Zhang comments

Results 14 comments of


                                            Vivian Zhang

trafficstars

[spirv] Add lowering configuration verification logic

Thanks for the details of the constraints. I have some questions. 1. How can I get `max_compute_workgrup_invocations`, `max_compute_workgroup_size` and `spv.target_env`? Is there any document/codes related to this? 2. Is the...

[spirv] Add lowering configuration verification logic

@antiagainst Sure, just start working on it. I have another question. How to get shared memory usage for matmul and conv?

[spirv] Add lowering configuration verification logic

> > @antiagainst Sure, just start working on it. I have another question. How to get shared memory usage for matmul and conv? > > You can follow the logic...

[spirv] Add lowering configuration verification logic

@antiagainst Okay, got it. Thanks!

[spirv] Reduce the number of tile size levels in lowering configuration

The current lowering configs for SPIRV have a very similar structure to the CPU configs. Can we keep this format? Or in another way is it possible to unify the...

[spirv] Reduce the number of tile size levels in lowering configuration

@antiagainst It's fine to keep three levels (or keep a similar structure to CPU configs if anything changed in the future). We do search regardless of the number of tiling...

[ROCM] Shared memory exceeded for bwd non-unit stride convs

I looked at the IR dump, and the error is because `the first fill and insert_slice op` weren't fused into the loop, so it loaded the entire data to the...

[ROCM] Shared memory exceeded for bwd non-unit stride convs

@zjgarvey Mahesh and I just went through the above IR, and it looks like the `tensor.collapse_shape` is hard to handle in codegen. Is there a way to get rid of...

[ROCM] Shared memory exceeded for bwd non-unit stride convs

> No, not really. If we go this padding route, there's not really a good way to modify the indexing maps without adding `floordiv` and `%` - especially in the...

[ROCM] Shared memory exceeded for bwd non-unit stride convs

> #map = affine_map (d0, d1 + d5, d2 + d6, d4)> > #map1 = affine_map (d4, d5, d6, d3)> > #map2 = affine_map (d0, d1, d2, d3)> > #map3...