Anton Mitkov
Anton Mitkov
Hello, I am interested in the performance provided by the ```xilinx::pipeline``` and ```xilinx::dataflow``` from the pipeline test ``` single_task_vector_add_drt_dataflow_func_local_pipeline ```. I have modified it to produce timing data of the...
Added modifications to the tests and benchmarks in order to use the ComputeCPP runtime fpga fix in the queue constructor. When compiling with the SYCL_BLAS_FPGA flag the queue is constructed...
Softmax benchdnn was missing a precision threshold for bf16, this PR adds it for the nvidia backend, as it caused some failing tests there.
### Name and Version ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA A100-PCIE-40GB, compute capability 8.0, VMM: yes register_backend: registered backend CUDA (1 devices)...