djl icon indicating copy to clipboard operation
djl copied to clipboard

Batch encoding text pairs in HuggingFaceTokenizer.

Open demq opened this issue 2 years ago • 0 comments

Description

Currently, the HuggingFaceTokenizer.batchEncode only supports batch encoding of arrays/lists of single text inputs, while the text pair inputs are only supported by encode(String text, String textPair) . To run batch-inference on a set of text pairs their encodings should be padded up to the same size.

Currently, the "out-of-the-box" solution is to create a tokenizer with padding set to the .optPadToMaxLength(), as discussed in https://github.com/deepjavalibrary/djl/issues/1996#issuecomment-1243256576 . This will result in too high of inference latency for the input text pair batches where the longest inputs can be tokenized to vectors that are notably shorter than maxLength.

An implementation of batchEncdode for text pairs (for example batchEncode(QAInput[] inputs) or batchEncode(String[] texts, String[] textPairs) would resolve this issue when called on a tokenizer creating with padding == PaddingStrategy.LONGEST .

This would be beneficial for the DJL users who want to optimize the latency of QA system using batching of the inputs, which is efficient for processing a big list of short Question/Context pairs on GPUs.

References

  • This option is implemented in the python interface of HuggingFace PreTrainedTokenizerFast: https://github.com/huggingface/transformers/blob/470799b3a67c6e078b9cb3a38dc0395a70e1a4a2/src/transformers/tokenization_utils.py#L671)

demq avatar Sep 13 '22 06:09 demq