machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

[Tokenizers] Port BERTTokenizers

Open ericstj opened this issue 1 year ago • 1 comments

Porting BERTTokenizers enables several text embedding generation models. Requires https://github.com/dotnet/machinelearning/issues/6988.

https://github.com/huggingface/text-embeddings-inference?tab=readme-ov-file#text-embeddings. https://github.com/huggingface/transformers/blob/v4.37.0/src/transformers/models/bert/tokenization_bert.py#L137

cc @luisquintanilla

We already have some BERT implementation which may be sufficient.

ericstj avatar Feb 08 '24 17:02 ericstj

Has this been implemented? It seems very similar to #7281.

ds5678 avatar Nov 13 '24 21:11 ds5678