Haibin Lin
Haibin Lin
+1
When you got a really fast GPU, a single python thread could be relatively slow to push computation to all GPUs when the batch size is small (GPU computation time...
You're right. We might work on a simpler API that don't require users to inherit from the interface so that users only need to change 1 or 2 lines of...
@kenjewu any update on this?
@lingss0918 gentle ping. Any update?
For the bert base uncased model, the vocab mapping is different, so the embedding weights need to be shuffled accordingly. @evah88 did you print the word ids in the batch...
The bos and eos token ids are different: HF: `[101, 7592, 1010, 2026, 3899, 2003, 10140, 102]` Gluon: `[ 2, 7592, 1010, 2026, 3899, 2003, 10140, 3]`
I'm also interested in python binding. Think taco as the backend for scipy.sparse on GPU - that would be really cool.
Do you mean you want a different DType for label?
is this resolved?