cloud11665

Results 59 comments of cloud11665

Superb work! I myself have been thinking about integrating SDF based fonts into my application. My idea was to use https://github.com/Chlumsky/msdfgen and just prompt the user with "generating fonts" modal...

yea. I agree about the golfing

Check out [docs/env_vars.md](https://github.com/geohot/tinygrad/blob/master/docs/env_vars.md). the CPU env var means that it just runs on the cpu and not the gpu (default)

![image](https://github.com/geohot/tinygrad/assets/59028866/61457877-2f40-48e1-a7ab-547dea65ba68) I have compiled a fully static library for executing ptx on the cpu. No need to mess with building gpuocelot

There is https://github.com/actions/cache which could be useful, as building gpuocelot requires building llvm, and we wouldn't want to do that every commit.

kk, a simple python wrapper is done: ```py kernel = r""" .version 7.5 .target sm_35 .address_size 64 // .globl _Z4E_16Pf .visible .entry _Z4E_16Pf( .param .u64 _Z4E_16Pf_param_0 ) { .reg .b32...

I'm experimenting in docker, and the longest step is downloading the cuda toolkit. Also, I will not be forking gpuocelot, and just maintain a patch in the tinygrad repo

> Patch is fine. How big is CUDA? If it's huge (>200MB), maybe we figure out how to cache it. All apt packages (cuda included) are ~8000MiB. It only takes...

but we do all of that only for the libcudacpu.so file, so caching that would save 3-4 mins

Tested locally and it worked, but there are 2 more things I want to add: - a non_stable mode for newer nvcc version (compiling for sm_50 and hoping it works...