D. Ko
D. Ko
could you point me to places when i have to modify size limit?
> Also, Julia uses a JIT compiler, so it might be good to run the code once before starting the actual benchmark so that it can be compiled. @jtrakk You...
Using __cuda_array_interface__ torch, cupy and jax are working with numba on cuda.jit nativly u just pass torch.cuda or jax.cuda tensor to them. TF didn't have that but have https://www.tensorflow.org/api_docs/python/tf/experimental/dlpack/from_dlpack so...
Dunno what OP wants from cuda.jit but its pretty neat function to have working python cuda code with gpu arrays, u can convert almost any function write to work on...
Would love this feature :)
1 week ago => > Ubuntu20.04 x86_64 cudnn images have been pushed! Having an issue with arm64 and ppc64le builds though. Will close this once those are released. So could...
Bump would love to try it on windows :)
This should works im using it to compress torch tensors inplace ```python3 .output( name, pix_fmt="yuv420p", # working on rtx 20xx series+ vcodec=vcodec, # ["hevc_nvenc", "h264_nvenc"] r=fps, **{ "profile:v": 2, "b:v":...
@ae99 Atomics for float works from metal_version >= 3, but before that you could use add_relaxed and reinterpret_cast float to atomic_uint and it would works fine also a lot faster...
Padding with constant in almost every framework use scalar value and its same here (I feel like passing array is some sort of performance optimisation (?) or maybe extra usability...