AITemplate
AITemplate copied to clipboard
Add log1p elementwise op
Summary:
log1p(x)
is more precise than log(1+x)
when x
is close to 0. We utilize cuda log1pf
implementation for fp32. For other precision types, input is first converted to float, then log1pf
is computed, finally output is converted back to original precision.
CUDA log1pf function for float and double: https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__SINGLE.html
Differential Revision: D54176180
This pull request was exported from Phabricator. Differential Revision: D54176180
This pull request was exported from Phabricator. Differential Revision: D54176180
This pull request was exported from Phabricator. Differential Revision: D54176180
This pull request was exported from Phabricator. Differential Revision: D54176180
This pull request was exported from Phabricator. Differential Revision: D54176180