chenyu comments

Results 33 comments of


                                            chenyu

Bug in View._reshape_mask

it probably needs to explicitly check the end condition for `next_mask`

remove CAST_BEFORE_VIEW

@chaosagent do you still want this? I don't see perf impact on benchmark so prefer to remove this unless there's big perf gain

remove CAST_BEFORE_VIEW

okay let me BEAM=2 resent on master too

remove CAST_BEFORE_VIEW

`HSA=1 DEFAULT_FLOAT=HALF WARMUP_EPOCHS=2 BS=768 GPUS=6 BENCHMARK=10 MODEL=resnet python3 examples/mlperf/model_train.py` uses ~87.4GB on master and this branch so no memory diff.

remove CAST_BEFORE_VIEW

both this pr and master have 430ms step time with default changed to HALF. i think it's safe to delete this.

search: add better default settings for faster search

maybe a version of `BEAM_MIN_PROGRESS` that relies on relative time can mitigate the slowdown issue.

search: add better default settings for faster search

can we get a version with `BEAM_MAX_TASKS_PER_CHILD` change and uops MAX only first? i think these 2 are the least controversial

search: add better default settings for faster search

also the benchmark beam runs took 50%-100% longer

search: add better default settings for faster search

will measure resnet compile time again after this change

Multiple Reduce Kernels

fyi you can add `DEBUG=4` to print the kernel source code, and `DEBUG=5` to print UOps