Lakhinder Walia comments

Results 28 comments of


                                            Lakhinder Walia

Add softmax cross entropy

> > Is there any verify test which compares ref value with the equivalent gpu based output? Thanks. > > Not needed here as this is an onnx function which...

Add softmax cross entropy

> @lakhinderwalia Added some of the composite tests in for k dimension and 2d cases (Batch, class size,...) & (Batch, class_size) May need to limit labels to a literal in...

Keep LayerNorm accumulator at FP32

Test of test_layernorm_large with this change show very similar numbers: (on banff-cyxtera-s81-2) With this change: `# bin/test_verify test_layernorm_large [ RUN ] test_layernorm_large [ COMPLETE ] test_layernorm_large (12240.9ms) [==========] 1 tests...

Keep LayerNorm accumulator at FP32

> Is this with Shapes ? std::vector dims = {1, 32, 8388608}; ? This is with std::vector dims = {1, 32, 262144}; > > Test of test_layernorm_large with this change...

Keep LayerNorm accumulator at FP32

LayerNorm operator perf report comparison testing of a large FP16 tensor: `half_type, {1, 32, 8388608}` New code: `gpu::code_object::layernorm_mul_add_kernel: 3.99752ms / 1 = 3.99752ms` Old code : `gpu::code_object::layernorm_mul_add_kernel: 3.87002ms / 1...

Auto-tuner infrastructure: maximal use of compute units

[https://github.com/ROCm/AMDMIGraphX/issues/3143](https://github.com/ROCm/AMDMIGraphX/issues/3143)

Auto-tuner infrastructure: maximal use of compute units

Based on an email thread from @hgaspar, this issue could include the collection on data w.r.t LDS, Occupancy, Register spillage, WF/WG etc. Thanks.

Auto-tuner infrastructure: maximal use of compute units

While one of the PRs above might fix the default WG size of Layernorm kernel. It is important to also enhance our general approach to allow a non-hard-coded size for...

Random number range expansion for Perf tests

> Are there tests for these methods? It seems odd that you can change the range limits and not need to change some test results. We stay in the same...

Random number range expansion for Perf tests

> I'm OK with either 1 or 2. Option 3 would take refactoring a lot of tests so unlikely we'll be doing that. I think Option 2 is good. Thanks.