MIOpen Implement Where backward

Add Where operation with contiguous backward kernel.
Add driver and gtest for kernel.
MIOpen performs better if:
- Input, other and condition tensors have the same shape
- All tensors are contiguous

Average improvement over ROCm

type	bwd
float16	1.79
float	1.74
bfloat16	1.8

Detail Benchmark

float16

op_name	dtype	size	contiguous	direction	rocm_kernel_avg	rocm_op_avg	MIOpen	Improvement over ROCm
Where	float16	[4 8 8]	contiguous	bwd	11693	55810	3732	3,13
Where	float16	[16 64 256]	contiguous	bwd	14428	51283	6754	2,14
Where	float16	[32 256 1024]	contiguous	bwd	107766	129265	74424	1,45
Where	float16	[380 114 60]	contiguous	bwd	43893	61296	26236	1,67
Where	float16	[378 482 201]	contiguous	bwd	429045	455280	314658	1,36
Where	float16	[24 131 197]	contiguous	bwd	18092	49236	9367	1,93
Where	float16	[123 329 190]	contiguous	bwd	101927	126642	69199	1,47
Where	float16	[393 183 475]	contiguous	bwd	403248	430666	291288	1,38
Where	float16	[46 62 101]	contiguous	bwd	14350	55155	7252	1,98
Where	float16	[427 499 454]	contiguous	bwd	1111200	1138201	823626	1,35

float32

op_name	dtype	size	contiguous	direction	rocm_kernel_avg	rocm_op_avg	MIOpen	Improvement over ROCm
Where	float32	[4 8 8]	contiguous	bwd	11756	57617	3892	3,02
Where	float32	[16 64 256]	contiguous	bwd	15035	43637	7501	2,00
Where	float32	[32 256 1024]	contiguous	bwd	136830	161000	94421	1,45
Where	float32	[380 114 60]	contiguous	bwd	53634	69534	33417	1,60
Where	float32	[378 482 201]	contiguous	bwd	541003	566405	415464	1,30
Where	float32	[24 131 197]	contiguous	bwd	21931	56819	11092	1,98
Where	float32	[123 329 190]	contiguous	bwd	125266	149916	87490	1,43
Where	float32	[393 183 475]	contiguous	bwd	504520	532226	387420	1,30
Where	float32	[46 62 101]	contiguous	bwd	14988	43733	7447	2,01
Where	float32	[427 499 454]	contiguous	bwd	1406867	1431501	1107410	1,27

bfloat16

op_name	dtype	size	contiguous	direction	rocm_kernel_avg	rocm_op_avg	MIOpen	Improvement over ROCm
Where	bfloat16	[4 8 8]	contiguous	bwd	11501	53699	3697	3,11
Where	bfloat16	[16 64 256]	contiguous	bwd	14238	51059	6950	2,05
Where	bfloat16	[32 256 1024]	contiguous	bwd	108724	123713	73749	1,47
Where	bfloat16	[380 114 60]	contiguous	bwd	44805	72910	26005	1,72
Where	bfloat16	[378 482 201]	contiguous	bwd	430119	458033	308848	1,39
Where	bfloat16	[24 131 197]	contiguous	bwd	18667	49172	9563	1,95
Where	bfloat16	[123 329 190]	contiguous	bwd	101143	122994	68329	1,48
Where	bfloat16	[393 183 475]	contiguous	bwd	404289	433658	287344	1,41
Where	bfloat16	[46 62 101]	contiguous	bwd	14797	45157	7163	2,07
Where	bfloat16	[427 499 454]	contiguous	bwd	1112234	1139333	813237	1,37

Oct 03 '24 02:10 cognaiger9

@cognaiger9 : Do we need to add any material to the ROCm docs to cover this?

Oct 03 '24 16:10 amd-jnovotny

@cognaiger9 : Do we need to add any material to the ROCm docs to cover this?

This operation belongs to the joining operations category according to PyTorch documentation. MIOpen doesn't currently have this category, so I added new material. Should I change it to a more general category, or should I use the existing category from the ROCm documentation?

Oct 04 '24 02:10 cognaiger9

@cognaiger9: Oh, in terms of where you added it in reference/index.rst, I think it's fine. I was only wondering if we needed to add extra material to any of the conceptual or how-to documents? (Maybe I'm not sure what you're referring to by the new material?)

Oct 04 '24 12:10 amd-jnovotny

@amd-jnovotny I think the current docs is sufficient and does not require extra material

Oct 07 '24 03:10 cognaiger9

MIOpen is moving to the new monorepo setup and all older unmerged PR's are being closed. Please re-open this as part of the new repo if these changes are still needed.

Jul 14 '25 06:07 BradPepersAMD