tokenizers icon indicating copy to clipboard operation
tokenizers copied to clipboard

Encoding.pad/truncate return () but could return the Encoding to chain calls

Open mandubian opened this issue 5 years ago • 2 comments

Encoding provides 2 functions to pad and truncate the current encoding. But the pad/truncate functions return () so you can't chain calls on the Encoding structure such as:

encoding = tokenizer.encode(...)
encoding.truncate(...).pad(...).ids

If pad/truncate were returning Result<Encoding>, we could chain calls. Is there any reason or choice for this and is my proposition non-sense (which can be possible)?

mandubian avatar Feb 03 '20 22:02 mandubian

I'd say the main reason is that the main path we imagined for Truncation and Padding, is by using with_truncation and with_padding on the Tokenizer directly. By doing so, the Encoding returned by encode is already truncated and padded, and you don't have to do it individually.

That being said, I don't see any reason that would prevent us from changing this. Feel free to make a PR!

n1t0 avatar Feb 10 '20 19:02 n1t0

I see! That wasn't clear when I was watching the API in my first look. I'll see if I can make a PR for that, I need to refresh my Rust tokenizer ;)

mandubian avatar Feb 10 '20 22:02 mandubian

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Jun 01 '24 01:06 github-actions[bot]