api-inference-community icon indicating copy to clipboard operation
api-inference-community copied to clipboard

Prompting warnings in widget response when the inference doesn't work

Open merveenoyan opened this issue 3 years ago • 39 comments

Hello,

Any warning gets appended in scikit-learn pipelines, even if the inference is successful. I suggest we should check if the prediction and the response looks good and if not, we should return warnings. Otherwise it gets prepended on top of the response and will break the widget. (what I observed was version mismatch, which doesn't happen in the production, I know, but I don't think version mismatch should concern the person if the predictions are returned well or any warning on warning level and not error level) (This is something I observed for text classification pipeline because I repurposed code from tabular pipelines, let me know if this isn't the case.) Also feel free to ignore this issue if this doesn't make any sense. I think the below code should be refactored.

for warning in record:
            _warnings.append(f"{warning.category.__name__}({warning.message})")

        for warning in self._load_warnings:
            _warnings.append(f"{warning.category.__name__}({warning.message})")

        if _warnings:
            for warning in _warnings:
                logger.warning(warning)

            if not exception:
                # we raise an error if there are any warnings, so that routes.py
                # can catch and return a non 200 status code.
                ### THIS IS THE PART I COMPLAIN ON :')
                error = {
                    "error": "There were warnings while running the model.",
                    "output": res,
                }
                raise ValueError(json.dumps(error))
            else:
                # if there was an exception, we raise it so that routes.py can
                # catch and return a non 200 status code.
                raise exception

        return res

WDYT @adrinjalali @BenjaminBossan

merveenoyan avatar Aug 31 '22 16:08 merveenoyan

what I observed was version mismatch, which doesn't happen in the production, I know, but I don't think version mismatch should concern the person if the predictions are returned well or any warning on warning level and not error level

I think this was exactly the intent by @adrinjalali because there is no guarantee that predictions are correct if versions don't match.

BenjaminBossan avatar Sep 01 '22 08:09 BenjaminBossan

Does predictions change according to versions of sklearn? 😅 Do the implementation changes? What would be a better solution is still returning the predictions and posting warnings below the widget (It can be put to another place in the response that can be understood by widget and put below the widget or something)

merveenoyan avatar Sep 01 '22 11:09 merveenoyan

Does predictions change according to versions of sklearn? sweat_smile Do the implementation changes?

Without knowing all the details, I think for the vast majority of cases, predictions won't change. However, there is a small chance that they do change, possibly leading to completely nonsensical output. Therefore, it's better to be safe and not return the prediction if correctness cannot be guaranteed.

BenjaminBossan avatar Sep 01 '22 11:09 BenjaminBossan

I think the solution is already there, The only thing which is not implemented is that the widget is currently not showing the returned warnings, but the api-inference is returning them. So the fix is on the widget side. This is the corresponding issue on the widget side: https://github.com/huggingface/huggingface.js/issues/318

Regarding changing predictions, sometimes the old model wouldn't even load on the new sklearn and other way around, and our tests also include such a case. The HistGradientBoostingClassifier does exactly that between 1.0 and 1.1 versions.

adrinjalali avatar Sep 01 '22 13:09 adrinjalali

>>> Regarding changing predictions, sometimes the old model wouldn't even load on the new sklearn and other way around, and our tests also include such a case. The HistGradientBoostingClassifier does exactly that between 1.0 and 1.1 versions.

@adrinjalali But if the predictions are returned why would we still need it?

merveenoyan avatar Sep 01 '22 13:09 merveenoyan

we return a non-200 return code, with the warnings attached, and the user can decide if they want to use it or not. Some warnings can be ignored.

adrinjalali avatar Sep 01 '22 13:09 adrinjalali

cc @mishig25 on this discussion on the widget side

osanseviero avatar Sep 01 '22 13:09 osanseviero

@adrinjalali @mishig25 Can we make that response parse-able by widget and print warnings below? Would help a lot of people willing to debug their widgets out (I get a lot of messages so it's a common thing I'd say).

merveenoyan avatar Sep 01 '22 14:09 merveenoyan

Have we gotten lots of messages related to warnings? Usually messages with questions are more related to errors, which we do show in the widget most of the time.

osanseviero avatar Sep 02 '22 13:09 osanseviero

I'm not sure what you mean by parsable here @merveenoyan . They are json, so they can easily be parsed.

@osanseviero from the sklearn side we raise quite a few warnings, and it's quite useful for users to see them in the widgets.

adrinjalali avatar Sep 05 '22 09:09 adrinjalali

I can't click JSON output here, is it possible we put the warnings somewhere that it's not supposed to be? That's what I mean with parse-able. @adrinjalali Ekran Resmi 2022-09-05 11 24 24

merveenoyan avatar Sep 05 '22 09:09 merveenoyan

Ah I see, if you call the API directly (using curl for example, then you get the full output. The skops.hub_utils.get_model_output also gives you the full output.

adrinjalali avatar Sep 05 '22 10:09 adrinjalali

@adrinjalali yes I know that, I'm talking for the widget itself because of this reason. I feel like (not sure) we're putting the warning in wrong place that it doesn't show over there.

merveenoyan avatar Sep 06 '22 12:09 merveenoyan

No we're not, you might want to re-read this one: https://github.com/huggingface/api-inference-community/issues/96#issuecomment-1234269717 😁

adrinjalali avatar Sep 07 '22 09:09 adrinjalali

@adrinjalali I can fix it after the text widget PR is done. (which I am a bit stuck with 503 errors)

merveenoyan avatar Sep 08 '22 11:09 merveenoyan

cc @mishig25 @beurkinger on internal discussion https://huggingface.slack.com/archives/C0314PXQC3W/p1664296775532499

Currently, most of the tabular model widgets are broken. E.g. for https://huggingface.co/julien-c/wine-quality I see

{"error": "There were warnings while running the model.", "output": [5, 5, 7]}

And upon closer lookup of the Network tab, I see

error: "{\"error\": \"There were warnings while running the model.\", \"output\": [5, 5, 7]}"
warnings: [,…]
0: "UserWarning(Trying to unpickle estimator Pipeline from version 0.24.2 when using version 1.1.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:\nhttps://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations)"
1: "UserWarning(Trying to unpickle estimator SVC from version 0.24.2 when using version 1.1.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:\nhttps://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations)"
2: "UserWarning(Trying to unpickle estimator StandardScaler from version 0.24.2 when using version 1.1.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:\nhttps://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations)"
3: "UserWarning(X has feature names, but StandardScaler was fitted without feature names)"

This is not a great UX as it's confusing for users. Even if we show the text of the warning in the widget, this will not be super useful for users (not model uploaders). Having such a strict requirement and breaking the widget for anyone without the same pinned version will lead to the widget not working for most users in the long run, which is undesirable. I don't think we can expect all people to use the same sklearn version as the one pinned in the API.

Should we consider showing the predictions, even if there is a mismatch, and expose the warnings below the widget?

osanseviero avatar Sep 28 '22 09:09 osanseviero

Related issue shared by @BenjaminBossan https://github.com/huggingface/huggingface.js/issues/318

osanseviero avatar Sep 28 '22 09:09 osanseviero

Should we consider showing the predictions, even if there is a mismatch, and expose the warnings below the widget?

Just to note, this part of the error message, "output": [5, 5, 7], corresponds to the model predictions. But it's not very obvious.

BenjaminBossan avatar Sep 28 '22 09:09 BenjaminBossan

This is not a great UX as it's confusing for users. Even if we show the text of the warning in the widget, this will not be super useful for users (not model uploaders). Having such a strict requirement and breaking the widget for anyone without the same pinned version will lead to the widget not working for most users in the long run, which is undesirable. I don't think we can expect all people to use the same sklearn version as the one pinned in the API.

These are some good points. I wonder if we should treat warnings caused by sklearn version mismatch differently, as they are false positives most of the time. Not sure how best to implement this, as we would still want to display the information that something might be wrong, but these warnings are certainly in a different category from, say, warnings about division by zero.

BenjaminBossan avatar Sep 28 '22 09:09 BenjaminBossan

Right now, the response from https://huggingface.co/julien-c/wine-quality is:

{error: '{"error": "There were warnings while running the model.", "output": [5, 5, 7]}', status: 'error'} which gets rendered as error status error is handled that way currently

image

The question is: Option 1: should I handle this resposne differently (i.e. treat this response as successfull reponse & show the output & ``warningdespite it havingstatus:error) Option 2: should the API produce output in different shape ? (for example: {output: [3,4,5], warning: 'xyz', status: 'success'}`)

Option1 or Option2 ?

mishig25 avatar Sep 28 '22 09:09 mishig25

I was wondering if we can change the logic here:

https://github.com/huggingface/api-inference-community/blob/778fe8446856a392646d11437cb634d979978be0/docker_images/sklearn/app/pipelines/tabular_classification.py#L70-L80

Right now, if there are any warnings, we just treat it as an error. Perhaps we can make an exception for warnings caused by version mismatch, return a successful response, and add an extra field with the information that there was a version mismatch.

BenjaminBossan avatar Sep 28 '22 09:09 BenjaminBossan

I think giving a successful response + adding some warning in an extra field, and then having the widget show the successful widget/table, but with a warning below, makes lots of sense to me.

osanseviero avatar Sep 28 '22 09:09 osanseviero

I think giving a successful response + adding some warning in an extra field, and then having the widget show the successful widget/table, but with a warning below, makes lots of sense to me.

I second to that. Treating response with status: error as success does not seem right

mishig25 avatar Sep 28 '22 09:09 mishig25

This is not a great UX as it's confusing for users. Even if we show the text of the warning in the widget, this will not be super useful for users (not model uploaders). Having such a strict requirement and breaking the widget for anyone without the same pinned version will lead to the widget not working for most users in the long run, which is undesirable. I don't think we can expect all people to use the same sklearn version as the one pinned in the API.

That's not true. All skops examples pin the sklearn version, and therefore the widget would work. The pinned version wouldn't change over time. There are no guarantees for the outputs to be correct when versions change, and this is not a big deal when users have specified the versions in the config.json file, which is easily done by skops tools.

I think giving a successful response + adding some warning in an extra field, and then having the widget show the successful widget/table, but with a warning below, makes lots of sense to me.

This would lead to users getting wrong results and relying on them, which would be very bad. The output is not a "successful" output in this case.

adrinjalali avatar Sep 28 '22 09:09 adrinjalali

This would lead to users getting wrong results and relying on them, which would be very bad. The output is not a "successful" output in this case.

In that case, we should still then show the response in the error box (but with the added warnings as well ?)

image

mishig25 avatar Sep 28 '22 09:09 mishig25

Another question I had is: right now, the response from https://huggingface.co/julien-c/wine-quality is:

{error: '{"error": "There were warnings while running the model.", "output": [5, 5, 7]}', status: 'error'} which gets rendered as error status error is handled that way currently

Why there are no warnings in the response at the moment?

mishig25 avatar Sep 28 '22 09:09 mishig25

I personally don't have a strong opinion on one of the two options:

  • widget takes the values from output if that key exists in the response json and puts them in the widget as usual, and shows the errors and warnings if there are any, so the user is warned
  • the widget always fails and shows all the warnings and errors as returned by the server and doesn't take the output key from the response to be put in the table.

IIRC @Narsil was very much in favor of the second option.

adrinjalali avatar Sep 28 '22 09:09 adrinjalali

Why there are no warnings in the response at the moment?

That's something which is worth fixing.

adrinjalali avatar Sep 28 '22 09:09 adrinjalali

@mishig25 Personally I would go for option 1. If we don't the most tabular classification widgets users can try on the website will be broken, which is kind of ridiculous. No point in punishing people who just want to see how widget works / what kind of result they can expect.

I don't know how server responses are shaped, but it would be nice to get the type of the error, so we can give a more useful message (or to return a better error message on the server side).

beurkinger avatar Sep 28 '22 09:09 beurkinger

From my side, I am happy to implement the widget either for Option1 or Option2.

However, there needs to be updates for both options:

  • if we go with Option1, we need to attach the warnings in the response (as asked here)
  • if we go with Option2, we need to change the response shape entirely (as suggested here)

I will submit a PR once one of the the options are decided & the necessary api-inference changes are made 👍

mishig25 avatar Sep 28 '22 09:09 mishig25