stannum
stannum copied to clipboard
Automatic batching
Now stannum
(and generally Taichi) cannot do automatic batch as done in PyTorch.
For example, the below can only handle 3 arrays, but if we have a batch of arrays, we will have to loop over the batch dimension or change the code to support batches of a fixed size. This issue is somewhat related to issue #5. The ultimate goal should be supporting automatic batching with tensors of valid flexible shapes.
@ti.kernel
def array_add(self):
for i in self.array0:
self.output_array[i] = self.array0[i] + self.array1[i]
For the first step, dynamic looping (i.e. calling the kernel over and over again) is acceptable and is a good first issue.
PRs and discussions are always welcomed.
This is 70% done in the newly introduced Tube
in v0.4.0, but this remains for Tin
, which I think is possible to be resolved.
Hi ! A question regarding the current state of automatic batching, does it launch the same kernel once for each batch element or does it includes it in a clever way in the compiled kernel ?
Hi ! A question regarding the current state of automatic batching, does it launch the same kernel once for each batch element or does it includes it in a clever way in the compiled kernel ?
Yeah, once for each batch element. And it's now only implemented in Tube
. I don't know how to do it in a more optimized way yet because Taichi doesn't have some compiler functionalities to compile a "batched" kernel (as far as I know from my communication with Taichi developers). So now the automatic batching is quite memory bound I would say, unless the kernel is super computation-intensive.