cublas topic
computeWorks_examples
Matrix multiplication example performed with OpenMP, OpenACC, BLAS, cuBLABS, and CUDA
nvml_examples
Examples showing how to utilize the NVML library for GPU monitoring
scikit-cuda
Python interface to GPU-powered libraries
deeppipe2
Deep Learning library using GPU(CUDA/cuBLAS)
cuda-swift
Parallel Computing Library for Linux and macOS & NVIDIA CUDA Wrapper
cublasgemm-benchmark
code for benchmarking GPU performance based on cublasSgemm and cublasHgemm
learn-gpgpu
Algorithms implemented in CUDA + resources about GPGPU
cublasHgemm-P100
Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm
cudarc
Safe rust wrapper around CUDA toolkit