leasunhy issues

Results 5 issues of


                                            leasunhy

What's kernel launch time?

The performance summary shows that my model spend ~50% time in the "kernel launch" step. I find other items easy to understand, but I have no idea what "kernel launch"...

Error generated when compiling a long python script should be improved

For benchmarking purpose (refs #363), I created a python script that has thousands of lines. The script looks like this (which can be run with CPython): ```python class A: def...

* Use `torch.optim.AdamW` as fallback Adam implementation. * Support selecting the fused versions of the optimizers (via `--use-fused-optimizer`). Speed: custom_fused (only available for Adam) > fused > foreach

replace the EMA implementation with pytorch-native AveragedModel

This PR replaces the custom EMA implementation with the one in PyTorch. Note that this PR breaks backward compatibility: it cannot load old-format checkpoints that were generated with ema enabled.

leasunhy

What's kernel launch time?

Error generated when compiling a long python script should be improved

support EMA for fp32/fp64

Pytorch native optim

replace the EMA implementation with pytorch-native AveragedModel