moto Feature: Add FailureLocation support to SageMaker Runtime

Not sure if that's a feature or a bugfix actually.

SageMaker invoke_endpoint_async is supposed to return docs:

InferenceId
OutputLocation
FailureLocation <-- this one is currently missing in Moto implementation

This makes code hard to test when we assume this field to be returned by AWS endpoint, and making it optional because cannot be properly mocked in tests does not look like a good direction.

Doing so, I added a specific endpoint as well to Moto backend. Idea is that we were not really able to customize the (async) inference output that is stored in S3. Previously, the output schema was forced to something like:

{
    "Body": body,
    "ContentType": _type,
    "InvokedProductionVariant": variant,
    "CustomAttributes": attrs,
}

That's the expected format for the sync inference response, but should not be forced on async endpoint IMHO.

For example, if I expect my endpoint to produce and output like:

{
  "bbox": "..."
  "text": "..."
}

Well we have to hack the test flow to make it work

Jul 30 '24 12:07 gael-ft

Hi @bblommers, thanks for reviewing my proposal !

How would you feel about keeping the existing endpoint, but adding another field to the supplied results to indicate whether a result is a failure or not?

I first thought I could do something similar. But then the problem is the "result" payload:

{
  "Body": "first body",
  "ContentType": "text/xml",
  "InvokedProductionVariant": "prod",
  "CustomAttributes": "my_attr",
},

Follows SageMaker real-time inference response 👍 But with SageMaker async inference, the "result" payload that will be stored in S3 has no specific schema. So users basically push what they want/need.

As described above, I can consider my async inference output to be:

{
  "bbox": "..."
  "text": "..."
}

And I am not expecting (in real world or moto) any Body, ContentType, ... fields.

To be honest, I considered as well doing some kind of condition to say:

if async inference, then only upload Body field, and discard everything else

But then I just realized that I couldn't imagine any current existing test using this endpoint to validate async inference. If they do, they must anyway (here is the PR to avoid it) override the S3 file right after so that it fits what they expect as result from their inference.

Given that statement above, I thought it was cleaner to have a second endpoint, to avoid having to set fields (because we expect them to be present in current Moto code) that will finally be completly discarded. (Talking about ContentType, CustomAttributes, ...)

Aug 24 '24 13:08 gael-ft

Ah, right, I understand - thank you for the explanation @gael-ft.

But then I just realized that I couldn't imagine any current existing test using this endpoint to validate async inference.

Me neither, based on your explanation - but if there's one thing I've learned as a maintainer, is that people will always find a way to ~ab~use Moto in a way that I've never thought about. :slightly_smiling_face:

So, second proposal: we add the async endpoint, but leave the current implementation as a fallback. People can/should use the proper async endpoint as it's supposed to be used, but people who somehow got the sync-endpoint to work can also still use it.

As a very rough draft:

if self.async_results_queue:
    # new implementation
    ...
elif self.results_queue:
    # existing implementation
    ...

Later on we can always phase out the old/existing implementation.

Does that work for you?

Aug 25 '24 10:08 bblommers

Sounds good to me ! I'll try to have a look at it in the coming days.

Aug 25 '24 11:08 gael-ft

Hello @bblommers,

Added fallback to "sync" results queue, as discussed, and tested accordingly.

Sep 06 '24 12:09 gael-ft

Codecov Report

Attention: Patch coverage is 93.10345% with 2 lines in your changes missing coverage. Please review.

Project coverage is 94.45%. Comparing base (9da032b) to head (4efcbb9). Report is 4 commits behind head on master.

Files with missing lines	Patch %	Lines
moto/sagemakerruntime/models.py	84.61%	2 Missing :warning:

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #7905   +/-   ##
=======================================
  Coverage   94.45%   94.45%           
=======================================
  Files        1141     1141           
  Lines       98184    98203   +19     
=======================================
+ Hits        92737    92762   +25     
+ Misses       5447     5441    -6

Flag	Coverage Δ
servertests	`28.95% <6.89%> (-0.01%)`	:arrow_down:
unittests	`94.43% <93.10%> (+<0.01%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Sep 07 '24 10:09 codecov[bot]

This is now part of moto >= 5.0.14.dev66

Sep 07 '24 11:09 github-actions[bot]