JetStream Update JetStream grpc proto to support I/O with text and token ids

Update gprc proto to support
- Request: token id or text (one of).
- Response: token id, text or both of them.
- Currently, request with either token id or text will return both token id and text.
- TODO: Add MLPerf mode: when input token id, return only token id.
Refactored detokenization handling and Tokenizer API to decouple the logics.
Added complete support for SentencePiece tokenizer streaming decoding (ensured output text correctness).
Added and update unit tests for token utils, orchestrator, and server.

May 07 '24 23:05 JoeZijunZhou

Any reason to add text back, I suggested we keep both str and id in response in #40. The answer is " don't want to decode it to str (or piece) in jetstream". Would you like to share the reasons to do make the total different decision in just two weeks?

Below are comment from PR 40:

@FanhaiLu1 FanhaiLu1 [2 weeks ago] Can we still keep a str as option? The internal keep both text and token id. @[JoeZijunZhou] [2 weeks ago] I guess we don't want to decode it to str (or piece) in jetstream, since it would have some off in the final result.

Mainly for our customer's request, they would like to have server-side detokenization support. This PR will include the complete support. " don't want to decode it to str (or piece) in jetstream" is due to the incorrect streaming detokenization previously.

May 08 '24 03:05 JoeZijunZhou

Any reason to add text back, I suggested we keep both str and id in response in #40. The answer is " don't want to decode it to str (or piece) in jetstream". Would you like to share the reasons to do make the total different decision in just two weeks? Below are comment from PR 40:

@FanhaiLu1 FanhaiLu1 [2 weeks ago] Can we still keep a str as option? The internal keep both text and token id. @[JoeZijunZhou] [2 weeks ago] I guess we don't want to decode it to str (or piece) in jetstream, since it would have some off in the final result.

Mainly for our customer's request, they would like to have server-side detokenization support. This PR will include the complete support. " don't want to decode it to str (or piece) in jetstream" is due to the incorrect streaming detokenization previously.

Thanks, can you create a issues and add the customer's request details? It will be good context for readers.

May 08 '24 16:05 FanhaiLu1

Addressing #79 .

May 08 '24 17:05 JoeZijunZhou

Any reason to add text back, I suggested we keep both str and id in response in #40. The answer is " don't want to decode it to str (or piece) in jetstream". Would you like to share the reasons to do make the total different decision in just two weeks? Below are comment from PR 40:

@FanhaiLu1 FanhaiLu1 [2 weeks ago] Can we still keep a str as option? The internal keep both text and token id. @[JoeZijunZhou] [2 weeks ago] I guess we don't want to decode it to str (or piece) in jetstream, since it would have some off in the final result.

Mainly for our customer's request, they would like to have server-side detokenization support. This PR will include the complete support. " don't want to decode it to str (or piece) in jetstream" is due to the incorrect streaming detokenization previously.

Soon, there could be another customer ask both token id an text support? I feel we should predict what are the customer's requirements and do it in once. Support token id, text and both is one solution for all the customers.

May 08 '24 20:05 FanhaiLu1

Any reason to add text back, I suggested we keep both str and id in response in #40. The answer is " don't want to decode it to str (or piece) in jetstream". Would you like to share the reasons to do make the total different decision in just two weeks? Below are comment from PR 40:

@FanhaiLu1 FanhaiLu1 [2 weeks ago] Can we still keep a str as option? The internal keep both text and token id. @[JoeZijunZhou] [2 weeks ago] I guess we don't want to decode it to str (or piece) in jetstream, since it would have some off in the final result.

Mainly for our customer's request, they would like to have server-side detokenization support. This PR will include the complete support. " don't want to decode it to str (or piece) in jetstream" is due to the incorrect streaming detokenization previously.

Soon, there could be another customer ask both token id an text support? I feel we should predict what are the customer's requirements and do it in once. Support token id, text and both is one solution for all the customers.

This already supports token id, text and both as response. User will not input both text and tokens at the same time in 1 request right? Why will user input both text and tokens to request JetStream API?

I was thinking simplified it to 2 mode:

For MLPerf, return token ids only
Other cases, always return text + token ids + score (to be added to Sample)

May 08 '24 22:05 JoeZijunZhou

Any reason to add text back, I suggested we keep both str and id in response in #40. The answer is " don't want to decode it to str (or piece) in jetstream". Would you like to share the reasons to do make the total different decision in just two weeks? Below are comment from PR 40:

@FanhaiLu1 FanhaiLu1 [2 weeks ago] Can we still keep a str as option? The internal keep both text and token id. @[JoeZijunZhou] [2 weeks ago] I guess we don't want to decode it to str (or piece) in jetstream, since it would have some off in the final result.

Mainly for our customer's request, they would like to have server-side detokenization support. This PR will include the complete support. " don't want to decode it to str (or piece) in jetstream" is due to the incorrect streaming detokenization previously.

Soon, there could be another customer ask both token id an text support? I feel we should predict what are the customer's requirements and do it in once. Support token id, text and both is one solution for all the customers.

This already supports token id, text and both as response. User will not input both text and tokens at the same time in 1 request right? Why will user input both text and tokens to request JetStream API?

I was thinking simplified it to 2 mode:

For MLPerf, return token ids only

Other cases, always return text + token ids + score (to be added to Sample)

Looks good to me for these setting:

Request: token id or text (one of) Response: token id, text or both of them.

May 08 '24 23:05 FanhaiLu1

When input text, return both text and token ids.

Is it still a streaming mode?

May 14 '24 22:05 FanhaiLu1

When input text, return both text and token ids.

Is it still a streaming mode?

Yes, each streaming content contains a token_ids list and its text piece with greedy decode strategy. Performance has no regression.

May 14 '24 23:05 JoeZijunZhou

JetStream JetStream copied to clipboard

Update JetStream grpc proto to support I/O with text and token ids

JetStream
JetStream copied to clipboard