api-inference-community
api-inference-community copied to clipboard
Prompting warnings in widget response when the inference doesn't work
Hello,
Any warning gets appended in scikit-learn pipelines, even if the inference is successful. I suggest we should check if the prediction and the response looks good and if not, we should return warnings. Otherwise it gets prepended on top of the response and will break the widget. (what I observed was version mismatch, which doesn't happen in the production, I know, but I don't think version mismatch should concern the person if the predictions are returned well or any warning on warning level and not error level) (This is something I observed for text classification pipeline because I repurposed code from tabular pipelines, let me know if this isn't the case.) Also feel free to ignore this issue if this doesn't make any sense. I think the below code should be refactored.
for warning in record:
_warnings.append(f"{warning.category.__name__}({warning.message})")
for warning in self._load_warnings:
_warnings.append(f"{warning.category.__name__}({warning.message})")
if _warnings:
for warning in _warnings:
logger.warning(warning)
if not exception:
# we raise an error if there are any warnings, so that routes.py
# can catch and return a non 200 status code.
### THIS IS THE PART I COMPLAIN ON :')
error = {
"error": "There were warnings while running the model.",
"output": res,
}
raise ValueError(json.dumps(error))
else:
# if there was an exception, we raise it so that routes.py can
# catch and return a non 200 status code.
raise exception
return res
WDYT @adrinjalali @BenjaminBossan
what I observed was version mismatch, which doesn't happen in the production, I know, but I don't think version mismatch should concern the person if the predictions are returned well or any warning on warning level and not error level
I think this was exactly the intent by @adrinjalali because there is no guarantee that predictions are correct if versions don't match.
Does predictions change according to versions of sklearn? 😅 Do the implementation changes? What would be a better solution is still returning the predictions and posting warnings below the widget (It can be put to another place in the response that can be understood by widget and put below the widget or something)
Does predictions change according to versions of sklearn? sweat_smile Do the implementation changes?
Without knowing all the details, I think for the vast majority of cases, predictions won't change. However, there is a small chance that they do change, possibly leading to completely nonsensical output. Therefore, it's better to be safe and not return the prediction if correctness cannot be guaranteed.
I think the solution is already there, The only thing which is not implemented is that the widget is currently not showing the returned warnings, but the api-inference is returning them. So the fix is on the widget side. This is the corresponding issue on the widget side: https://github.com/huggingface/huggingface.js/issues/318
Regarding changing predictions, sometimes the old model wouldn't even load on the new sklearn and other way around, and our tests also include such a case. The HistGradientBoostingClassifier does exactly that between 1.0 and 1.1 versions.
>>> Regarding changing predictions, sometimes the old model wouldn't even load on the new sklearn and other way around, and our tests also include such a case. The HistGradientBoostingClassifier does exactly that between 1.0 and 1.1 versions.
@adrinjalali But if the predictions are returned why would we still need it?
we return a non-200 return code, with the warnings attached, and the user can decide if they want to use it or not. Some warnings can be ignored.
cc @mishig25 on this discussion on the widget side
@adrinjalali @mishig25 Can we make that response parse-able by widget and print warnings below? Would help a lot of people willing to debug their widgets out (I get a lot of messages so it's a common thing I'd say).
Have we gotten lots of messages related to warnings? Usually messages with questions are more related to errors, which we do show in the widget most of the time.
I'm not sure what you mean by parsable here @merveenoyan . They are json, so they can easily be parsed.
@osanseviero from the sklearn side we raise quite a few warnings, and it's quite useful for users to see them in the widgets.
I can't click JSON output here, is it possible we put the warnings somewhere that it's not supposed to be? That's what I mean with parse-able. @adrinjalali

Ah I see, if you call the API directly (using curl for example, then you get the full output. The skops.hub_utils.get_model_output also gives you the full output.
@adrinjalali yes I know that, I'm talking for the widget itself because of this reason. I feel like (not sure) we're putting the warning in wrong place that it doesn't show over there.
No we're not, you might want to re-read this one: https://github.com/huggingface/api-inference-community/issues/96#issuecomment-1234269717 😁
@adrinjalali I can fix it after the text widget PR is done. (which I am a bit stuck with 503 errors)
cc @mishig25 @beurkinger on internal discussion https://huggingface.slack.com/archives/C0314PXQC3W/p1664296775532499
Currently, most of the tabular model widgets are broken. E.g. for https://huggingface.co/julien-c/wine-quality I see
{"error": "There were warnings while running the model.", "output": [5, 5, 7]}
And upon closer lookup of the Network tab, I see
error: "{\"error\": \"There were warnings while running the model.\", \"output\": [5, 5, 7]}"
warnings: [,…]
0: "UserWarning(Trying to unpickle estimator Pipeline from version 0.24.2 when using version 1.1.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:\nhttps://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations)"
1: "UserWarning(Trying to unpickle estimator SVC from version 0.24.2 when using version 1.1.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:\nhttps://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations)"
2: "UserWarning(Trying to unpickle estimator StandardScaler from version 0.24.2 when using version 1.1.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:\nhttps://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations)"
3: "UserWarning(X has feature names, but StandardScaler was fitted without feature names)"
This is not a great UX as it's confusing for users. Even if we show the text of the warning in the widget, this will not be super useful for users (not model uploaders). Having such a strict requirement and breaking the widget for anyone without the same pinned version will lead to the widget not working for most users in the long run, which is undesirable. I don't think we can expect all people to use the same sklearn version as the one pinned in the API.
Should we consider showing the predictions, even if there is a mismatch, and expose the warnings below the widget?
Related issue shared by @BenjaminBossan https://github.com/huggingface/huggingface.js/issues/318
Should we consider showing the predictions, even if there is a mismatch, and expose the warnings below the widget?
Just to note, this part of the error message, "output": [5, 5, 7], corresponds to the model predictions. But it's not very obvious.
This is not a great UX as it's confusing for users. Even if we show the text of the warning in the widget, this will not be super useful for users (not model uploaders). Having such a strict requirement and breaking the widget for anyone without the same pinned version will lead to the widget not working for most users in the long run, which is undesirable. I don't think we can expect all people to use the same
sklearnversion as the one pinned in the API.
These are some good points. I wonder if we should treat warnings caused by sklearn version mismatch differently, as they are false positives most of the time. Not sure how best to implement this, as we would still want to display the information that something might be wrong, but these warnings are certainly in a different category from, say, warnings about division by zero.
Right now, the response from https://huggingface.co/julien-c/wine-quality is:
{error: '{"error": "There were warnings while running the model.", "output": [5, 5, 7]}', status: 'error'} which gets rendered as error status error is handled that way currently
The question is:
Option 1: should I handle this resposne differently (i.e. treat this response as successfull reponse & show the output & ``warningdespite it havingstatus:error) Option 2: should the API produce output in different shape ? (for example: {output: [3,4,5], warning: 'xyz', status: 'success'}`)
Option1 or Option2 ?
I was wondering if we can change the logic here:
https://github.com/huggingface/api-inference-community/blob/778fe8446856a392646d11437cb634d979978be0/docker_images/sklearn/app/pipelines/tabular_classification.py#L70-L80
Right now, if there are any warnings, we just treat it as an error. Perhaps we can make an exception for warnings caused by version mismatch, return a successful response, and add an extra field with the information that there was a version mismatch.
I think giving a successful response + adding some warning in an extra field, and then having the widget show the successful widget/table, but with a warning below, makes lots of sense to me.
I think giving a successful response + adding some warning in an extra field, and then having the widget show the successful widget/table, but with a warning below, makes lots of sense to me.
I second to that. Treating response with status: error as success does not seem right
This is not a great UX as it's confusing for users. Even if we show the text of the warning in the widget, this will not be super useful for users (not model uploaders). Having such a strict requirement and breaking the widget for anyone without the same pinned version will lead to the widget not working for most users in the long run, which is undesirable. I don't think we can expect all people to use the same sklearn version as the one pinned in the API.
That's not true. All skops examples pin the sklearn version, and therefore the widget would work. The pinned version wouldn't change over time. There are no guarantees for the outputs to be correct when versions change, and this is not a big deal when users have specified the versions in the config.json file, which is easily done by skops tools.
I think giving a successful response + adding some warning in an extra field, and then having the widget show the successful widget/table, but with a warning below, makes lots of sense to me.
This would lead to users getting wrong results and relying on them, which would be very bad. The output is not a "successful" output in this case.
This would lead to users getting wrong results and relying on them, which would be very bad. The output is not a "successful" output in this case.
In that case, we should still then show the response in the error box (but with the added warnings as well ?)

Another question I had is: right now, the response from https://huggingface.co/julien-c/wine-quality is:
{error: '{"error": "There were warnings while running the model.", "output": [5, 5, 7]}', status: 'error'} which gets rendered as error status error is handled that way currently
Why there are no warnings in the response at the moment?
I personally don't have a strong opinion on one of the two options:
- widget takes the values from
outputif that key exists in the response json and puts them in the widget as usual, and shows the errors and warnings if there are any, so the user is warned - the widget always fails and shows all the warnings and errors as returned by the server and doesn't take the
outputkey from the response to be put in the table.
IIRC @Narsil was very much in favor of the second option.
Why there are no warnings in the response at the moment?
That's something which is worth fixing.
@mishig25 Personally I would go for option 1. If we don't the most tabular classification widgets users can try on the website will be broken, which is kind of ridiculous. No point in punishing people who just want to see how widget works / what kind of result they can expect.
I don't know how server responses are shaped, but it would be nice to get the type of the error, so we can give a more useful message (or to return a better error message on the server side).
From my side, I am happy to implement the widget either for Option1 or Option2.
However, there needs to be updates for both options:
- if we go with Option1, we need to attach the warnings in the response (as asked here)
- if we go with Option2, we need to change the response shape entirely (as suggested here)
I will submit a PR once one of the the options are decided & the necessary api-inference changes are made 👍