machinelearning
machinelearning copied to clipboard
Batch Tokenization Support
Is your feature request related to a problem? Please describe. Most AI systems use batching for performance reasons, needing all tokenized sentences being the same size and outputting a mask of which values are padding. In my project I had to implement this myself. The issues are mostly performance and API compatibility with the ecosystem. With my solution - There are megabytes of allocations:
The Int64[] allocations are due to widening needed to be done since the ONNX model needs Tensor
Describe the solution you'd like Enable 0 allocations solution via an API like the following:
class Tokenizer
{
...
public abstract void BatchTokenize<T, K>(ReadOnlySpan<string> texts, int maxTokenCount, Tensor<T> inoutIds, Tensor<T> inputMask)
where T: INumber<T>;
public abstract void BatchTokenize<T>(ReadOnlySpan<string> texts, int maxTokenCount, Tensor<T> inputIds, Tensor<T> inputMask, Tensor<T> tokenTypeIds)
where T: INumber<T>;
}
Maybe instead of Tensor<T> you want to use TensorSpan<T>?
Where the string allocations are removed if not needed, and the other internal allocations optimized.
This API will enable me to pool Tensors, and removing casting from the int to long for my models.
Describe alternatives you've considered I have implemented my own batch tokenizer: https://github.com/tjwald/high-perf-ML/blob/develop/ML.Infra/Tokenization/PretrainedTokenizer.cs.
Additional context Continuing this ticket: https://github.com/microsoft/semantic-kernel/issues/9793 on the tokenization part.
Other implementation from AI Dev Gallery: https://github.com/microsoft/ai-dev-gallery/blob/main/AIDevGallery/Samples/SharedCode/TokenizerExtensions.cs
After inplementing other scenarions, I think that the API should get TensorSpan - it is more flexible.
Note that the tokenizers library is supporting .NET Core and NetStandard 2.0. Tensors are not supported in NetStandard. We need to figure out another way can help doing the batching and make it easy to bridge the result to a tenso/span.
@luisquintanilla @tarekgh Any updates here?
Not yet. For now, you can continue using your own batching implementation. We’ve had to prioritize other work, which pushed this item down the list. My understanding is that this isn’t blocking you, correct?
No blocker, just loosing performance :)