Haibin Lin

Results 31 comments of Haibin Lin

When you got a really fast GPU, a single python thread could be relatively slow to push computation to all GPUs when the batch size is small (GPU computation time...

You're right. We might work on a simpler API that don't require users to inherit from the interface so that users only need to change 1 or 2 lines of...

For the bert base uncased model, the vocab mapping is different, so the embedding weights need to be shuffled accordingly. @evah88 did you print the word ids in the batch...

The bos and eos token ids are different: HF: `[101, 7592, 1010, 2026, 3899, 2003, 10140, 102]` Gluon: `[ 2, 7592, 1010, 2026, 3899, 2003, 10140, 3]`

I'm also interested in python binding. Think taco as the backend for scipy.sparse on GPU - that would be really cool.

Do you mean you want a different DType for label?

is this resolved?