tinygrad
tinygrad copied to clipboard
init multidevice cuda graph
graph is multidevice, quite not huge perf impact (test hlb, 2gpus). need to enqueue transfers as well to get the speed