Lakhinder Walia

Results 28 comments of Lakhinder Walia

> > Is there any verify test which compares ref value with the equivalent gpu based output? Thanks. > > Not needed here as this is an onnx function which...

> @lakhinderwalia Added some of the composite tests in for k dimension and 2d cases (Batch, class size,...) & (Batch, class_size) May need to limit labels to a literal in...

Test of test_layernorm_large with this change show very similar numbers: (on banff-cyxtera-s81-2) With this change: `# bin/test_verify test_layernorm_large [ RUN ] test_layernorm_large [ COMPLETE ] test_layernorm_large (12240.9ms) [==========] 1 tests...

> Is this with Shapes ? std::vector dims = {1, 32, 8388608}; ? This is with std::vector dims = {1, 32, 262144}; > > Test of test_layernorm_large with this change...

LayerNorm operator perf report comparison testing of a large FP16 tensor: `half_type, {1, 32, 8388608}` New code: `gpu::code_object::layernorm_mul_add_kernel: 3.99752ms / 1 = 3.99752ms` Old code : `gpu::code_object::layernorm_mul_add_kernel: 3.87002ms / 1...

[https://github.com/ROCm/AMDMIGraphX/issues/3143](https://github.com/ROCm/AMDMIGraphX/issues/3143)

Based on an email thread from @hgaspar, this issue could include the collection on data w.r.t LDS, Occupancy, Register spillage, WF/WG etc. Thanks.

While one of the PRs above might fix the default WG size of Layernorm kernel. It is important to also enhance our general approach to allow a non-hard-coded size for...

> Are there tests for these methods? It seems odd that you can change the range limits and not need to change some test results. We stay in the same...

> I'm OK with either 1 or 2. Option 3 would take refactoring a lot of tests so unlikely we'll be doing that. I think Option 2 is good. Thanks.