MLServer
MLServer copied to clipboard
support json serialize all kind's of huggingface pipelines outputs
huggingface_runtime output JSON serializer does not support NumPy basic datatypes when the data is a dict value
I fixed the lint error and add some tests for the NumpyEncoder, but I can't run the test successfully in my local environment because of the error ImportError: cannot import name 'deepspeed_reinit' from 'transformers.deepspeed'. Can I run tests on Github Actions?
I fixed the lint error and add some tests for the NumpyEncoder, but I can't run the test successfully in my local environment because of the error
ImportError: cannot import name 'deepspeed_reinit' from 'transformers.deepspeed'. Can I run tests on Github Actions?
Tests passed on my Desktop
After test more huggingface models, I think the NumpyEncoder should renamed to HuggingfaceOutputJSONEncoder. Because huggingface pipeline output is not only numpy datatypes but also Pillow's Image, before more tests run, i'm not sure how many python types exists in pipeline's outputs.
A small script tested passed in my local, it seems ok now, but I won't add it to tests, because I think it's so heavy
from transformers.pipelines import pipeline
from transformers import Conversation
import json
import numpy as np
from PIL import Image
import io
import base64
class CommonJSONEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.ndarray):
return obj.tolist()
if isinstance(obj, (np.float_, np.float16, np.float32, np.float64)):
return float(str(obj))
if isinstance(obj, (np.int_, np.int8, np.int16, np.int32, np.int64)):
return int(obj)
if isinstance(obj, Image.Image):
buf = io.BytesIO()
obj.save(buf, format="png")
return base64.b64encode(buf.getvalue()).decode()
if isinstance(obj, Conversation):
return {
'uuid': str(obj.uuid),
'past_user_inputs': obj.past_user_inputs,
'generated_responses': obj.generated_responses,
'new_user_input': obj.new_user_input
}
return json.JSONEncoder.default(self, obj)
sumarytext = "The tower is 324 metres (1,063 ft) tall, about\
the same height as an 81-storey building, and the tallest structure \
in Paris. Its base is square, measuring 125 metres (410 ft) on each side.\
During its construction, the Eiffel Tower surpassed the\
Washington Monument to become the tallest man-made structure\
in the world, a title it held for 41 years until the Chrysler\
Building in New York City was finished in 1930. It was the\
first structure to reach a height of 300 metres. Due to the\
addition of a broadcasting aerial at the top of the tower in\
1957, it is now taller than the Chrysler Building by 5.2 metres\
(17 ft). Excluding transmitters, the Eiffel Tower is the second\
tallest free-standing structure in France after the Millau Viaduct."
ALL_TASKS_TESTS = {
"audio-classification": [
{"args": (), "kwargs": {"inputs": "fixtures/audio.mp3"}}
],
"automatic-speech-recognition": [
{"args": (), "kwargs": {"inputs": "fixtures/audio.mp3"}}
],
"feature-extraction": [
{
"args": (),
"kwargs": {
"inputs": "this pure text",
},
}
],
"text-classification": [
{
"args": (
"A greeting brings the care of the heart,\
a blessing brings the care of the body, and a short\
message brings the love. May you be lucky and happy,\
and your life is sweet as honey, and your career is \
successful and official!",
),
"kwargs": {},
}
],
"token-classification": [
{
"args": (),
"kwargs": {"inputs": "Hello I'm Omar and I live in Zürich."}
}
],
"question-answering": [
{
"args": (),
"kwargs": {
"question": "what's her job?",
"context": "Her name is Lily, she is a singer",
},
}
],
"table-question-answering": [
{
"args": (),
"kwargs": {
"table": {
"actors": [
"brad pitt",
"leonardo di caprio",
"george clooney"
],
"age": ["56", "45", "59"],
"number of movies": ["87", "53", "69"],
"date of birth": [
"7 february 1967",
"10 june 1996",
"28 november 1967",
],
},
"query": "when brad pitt born?",
},
}
],
"visual-question-answering": [
{
"args": (),
"kwargs": {
"image": "fixtures/dogs.jpg",
"question": "how many dogs here?"
},
}
],
"fill-mask": [
{
"args": (),
"kwargs": {
"inputs": "i am come from <mask>",
},
}
],
"summarization": [{"args": (sumarytext,), "kwargs": {}}],
"translation_en_to_de": [
{"args": ("My name is Sarah and I live in London",), "kwargs": {}}
],
"text2text-generation": [
{
"args": (
"question: What is 42 ? context: 42 is the answer to life,\
the universe and everything",
),
"kwargs": {},
},
{
"args": ("translate from English to French: I'm very happy",),
"kwargs": {}
},
],
"text-generation": [
{"args": ("My name is Tylor Swift and I am",), "kwargs": {}}
],
"zero-shot-classification": [
{
"args": (),
"kwargs": {
"sequences": "Last week I upgraded my iOS version and ever\
since then my phone has been overheating whenever I\
use your app.",
"candidate_labels": [
"mobile",
"website",
"billing",
"account",
"access",
],
"multi_class": False,
},
}
],
"zero-shot-image-classification": [
{
"args": (),
"kwargs": {
"images": "fixtures/dogs.jpg",
"candidate_labels": ["dogs", "cats", "tigers"],
},
}
],
"conversational": [
{
"args": (),
"kwargs": {
"conversations": [
Conversation("Hello"),
Conversation("how are you"),
Conversation("where are you from"),
],
},
}
],
"image-classification": [
{
"args": (),
"kwargs": {
"images": "fixtures/dogs.jpg",
},
}
],
"image-segmentation": [
{
"args": (),
"kwargs": {
"inputs": "fixtures/dogs.jpg",
},
}
],
"object-detection": [
{
"args": (),
"kwargs": {
"inputs": "fixtures/dogs.jpg",
},
}
],
}
all_types = {}
outs = []
def inspect_types(output):
if isinstance(output, dict):
for k, v in output.items():
inspect_types(k)
inspect_types(v)
elif isinstance(output, (list, tuple)):
for el in output:
inspect_types(el)
elif isinstance(output, (int, str, float)):
all_types[type(output)] = 1
else:
all_types[type(output)] = 1
for task, argslist in ALL_TASKS_TESTS.items():
print("-" * 80)
print(task)
p = pipeline(task)
for arg in argslist:
output = p(*arg["args"], **arg["kwargs"])
inspect_types(output)
outs.append(output)
print("all kinds of datatypes:")
print(list(all_types.keys()))
print(json.dumps(outs, cls=CommonJSONEncoder))
timm is the requirement for some visualization-related Huggingface tasks, Pillow is one of the dependencies for it;
ffmpeg is the system requirement for some audio-related Huggingface tasks, I found there is no place to install single apt package for a specific runtime, So I add it to MLServer's basic runtime
torch-scatter for task table-question-anwser
WIP
What changes in this PR
I'm using Seldon and MLServer in a project, that is aimed to fast deploy and experience any Hugginface model; when I test with some visual-related models, I go some JSON serialize error like this https://github.com/SeldonIO/MLServer/issues/656, At first, I just want fix the JSON serialize error. But when I get further, I get more trouble with inputs;
changes:
- add
is_singlefiled in parameters to support decode request input as a single value, this field is now affectedStringCodecNumpyCodecBase64Codec; - add
HuggingfaceRuntimeCodec, which can auto decode the inputs as pipelines args
May not be compatible
NumpyCodec always converts the inputs to a single np.ndarray before, now if no parameter is_single provided, NumpyCodec would decodes the inputs as an list.
A remedy here, change the is_sinle default value to None, then change NumpyCodec`s default behavior, if is_single is None, set is_single to True; but which may confusing
@adriangonz any advise here?
Hey @pepesi ,
Thanks a lot for the time you've put on this PR - I know it hasn't been an easy journey!
It's grown quite massively, so it will take us a bit of time to review it. Particularly, considering that there are breaking changes to some of the existing functionality. We will do our best though!
Firstly I agree with the opinion, codecs should always operate with lists of things (i.e. batches), and then is up to the runtime to deal with these. Before that, I didn't realize that it was a principle to follow. So I think I will remove is_single next
And then, Let me explain why I add is_single before.
For visual-question-answering and table-question-answering task pipelines, they support args like this pipeline(image=Image.Image, question=str), but codecs would decode the request data like this [Image.Image] [question] (a list). So I add the parameter to support decode data as a single value, which would make the codecs decode the request data to like Image.Image str(a single value).
due to adding the parameter is_single, triton httpclient request need to pass it to the server also, that why I modify InferInput._parameters. as @RafalSkolasinski said, it may cause breaking changes.
Got it! Thanks for that explanation @pepesi .
In that case, probably best is to handle the list -> single element logic within the HugginFace runtime itself. Actually, thinking about that, we may need to do that to support adaptive batching anyway (i.e. handle the case where an incoming request has more than a single data point). That's totally outside the scope of this PR though! Don't worry about that use case!
Looking forward to next batch of changes! Keep up the great work! We are getting closer. :rocket:
Thanks a lot for making those changes @pepesi. Once again, massive thanks for all the effort you've put into this one.
PR looks great and tests are green, so this should be good to go! :rocket: