hpkfft.com
hpkfft.com
By default, the CUDA compiler sets `-prec-div=true`, `-prec-sqrt=true`, and `-ftz=false`. https://docs.nvidia.com/cuda/floating-point/index.html#compiler-flags The CuPy library is compiled with `-ftz=true` (overriding the default for this particular flag). Thanks, @leofang, for this info.
Note that for IEEE square root a subnormal result can never be produced. So, a hardware mode that flushes subnormal results to zero is irrelevant.
By the way, I would support a recommendation that array libraries offer directed rounding as an option. One way to try to assess the effects of roundoff upon a floating-point...
Given the significant improvements to the free-threaded mode in Python 3.14, I would suggest prioritizing #1052 over providing a container for free-threaded 3.13.
A quick thought (that might appear simpler to developers) would be `rvp::lookup | rvp::move` instead of `return_existing_or_move`.
Is it useful to have both `copy` and `move`? I haven't had the occasion to use this aspect of nanobind much, so I may be missing something obvious. However, I'm...
Just some quick thoughts: 1. Are you using `cmake -DCMAKE_BUILD_TYPE=Release` 2. If you are (and you should), I think the default is `O3`, so you probably don't need `add_compile_options(-O3)` 3....
I'm OK with any decision that is made; I just want my own documentation to be correct and professional. If Python wants to declare that one should use `freethreading` as...
Note that I have installed `libomp-devel`. The only version I see available is `15.0.7-5.amzn2023.0.1`. If it's helpful for testing, here's a sample program, `test.cpp` ``` #include #include int main() {...
Yes, thank you, or use compiler flag `-I/usr/lib64/clang/15.0.7/include` I just thought the Amazon linux packaging team would like to know about this issue.