keras-nlp icon indicating copy to clipboard operation
keras-nlp copied to clipboard

Inclusion of Offset Mapping and Stride Length

Open TheAthleticCoder opened this issue 1 year ago • 0 comments

While working on #741, one of the important tasks is to get the start index and end index of the answer tokens in the context. When we use the Keras Preprocessor and set the max_length, sometimes the answer tokens aren't found in the context tokens (owing to the fixed length). Hugging Face handles this using offset mapping and stride length in their tokenizers as can be seen in their documentation. We should look towards including the same in the Keras preprocessor as well.

Temporary Fix: For the SQuAD example task, I can code out the offset mapping and stride length separately, but we should work towards a better fix.

If there is any other suggestion or approach towards including this in the library, I would like to take it up.

TheAthleticCoder avatar Mar 13 '23 12:03 TheAthleticCoder