jinge90
jinge90
Hi, @tfzhu Could you help check whether the pre-ci failure in Jenkins/Precommit is a infrastructure issue? Thanks very much.
> Hello @jinge90 > > ```cuda-c++ > #include > #include > #include > #include > > extern "C" > __device__ > double __nv_rcp64h(double a); > > __device__ > void calculate(const...
> @jinge90 I believe it flushes denormals to zero in source and destination and utilizes a slightly different table. > > 1-bit differences are unavoidable: > > ``` > >...
> @jinge90 rcp64h provides initial value for division algorithm to work. Sometimes such algorithms are implemented as table-lookup. > > ``` > 0x1.08d3e00000000p-1 > > > real math infinite prec....
> I suggest to even out with FTZ/DAZ behavior. I wrote this quickly to test new behavior. > > ```c++ > void emulate(const double* a, double* y) { > uint64_t...
Hi, @intel/dpcpp-tools-reviewers , @intel/llvm-reviewers-runtime and @aelovikov-intel Could you help review this patch? Thanks very much.
Hi, @intel/dpcpp-tools-reviewers Could you help review this patch? Thanks very much.
Hi, @intel/dpcpp-tools-reviewers Kind ping~. Thanks very much.
Hi, @intel/dpcpp-tools-reviewers Kind ping~. Thanks very much.
Hi, @steffenlarsen and @JackAKirk Yes, for "non-cuda" targets, we just use generic fp32 math functions to implement these bf16 functions, they can run on any device. Thanks very much.