stannum icon indicating copy to clipboard operation
stannum copied to clipboard

Automatic batching

Open ifsheldon opened this issue 3 years ago • 3 comments

Now stannum (and generally Taichi) cannot do automatic batch as done in PyTorch.

For example, the below can only handle 3 arrays, but if we have a batch of arrays, we will have to loop over the batch dimension or change the code to support batches of a fixed size. This issue is somewhat related to issue #5. The ultimate goal should be supporting automatic batching with tensors of valid flexible shapes.

@ti.kernel
def array_add(self):
    for i in self.array0:
        self.output_array[i] = self.array0[i] + self.array1[i]  

For the first step, dynamic looping (i.e. calling the kernel over and over again) is acceptable and is a good first issue.

PRs and discussions are always welcomed.

ifsheldon avatar Dec 30 '21 20:12 ifsheldon

This is 70% done in the newly introduced Tube in v0.4.0, but this remains for Tin, which I think is possible to be resolved.

ifsheldon avatar Jan 14 '22 18:01 ifsheldon

Hi ! A question regarding the current state of automatic batching, does it launch the same kernel once for each batch element or does it includes it in a clever way in the compiled kernel ?

sebastienwood avatar Jul 29 '22 15:07 sebastienwood

Hi ! A question regarding the current state of automatic batching, does it launch the same kernel once for each batch element or does it includes it in a clever way in the compiled kernel ?

Yeah, once for each batch element. And it's now only implemented in Tube. I don't know how to do it in a more optimized way yet because Taichi doesn't have some compiler functionalities to compile a "batched" kernel (as far as I know from my communication with Taichi developers). So now the automatic batching is quite memory bound I would say, unless the kernel is super computation-intensive.

ifsheldon avatar Jul 29 '22 16:07 ifsheldon