fastapi-cache icon indicating copy to clipboard operation
fastapi-cache copied to clipboard

Endpoints that return a Response should be handled better

Open mjpieters opened this issue 1 year ago • 2 comments

There are several issues that affect endpoints that can (optionally) return a Response object:

  • If they are annotated to return a Response, on a cache hit there is an exception:

    @app.get("/cache_response_obj")
    @cache(namespace="test", expire=5)
    async def cache_response_obj() -> JSONResponse:
        return JSONResponse({"a": 1})
    

    triggers a RuntimeError: no validator found for <class 'starlette.responses.JSONResponse'>, see arbitrary_types_allowed in Config exception as Pydantic can't handle response objects.

  • Even when not annotated, only JSONResponse objects are handled, but badly. They are explicitly unwrapped by JsonCoder.encode(), and so on a cache hit their Content-Type: application/json header is lost, as is any status code other than 200 and any other headers added.

  • The PickleCoder.encode method special cases Jinja2Templates.TemplateResponse objects, unwrapping those and so losing headers and the status code, too. The unwrapping was done because the class has a template and a context attribute and these can contain unpickleable objects. This could be handled better by replacing the object with a regular response object.

There are two options here:

  • we could disable the cache if any Response object is returned from the decorator. Not a popular option given that json and template responses are going to be common, at the very least.

  • special-case responses, replacing them with a serialisable wrapper. This wrapper can store the headers, status code and the contained body so that on a cache hit, we can reconstruct it.

For the second case we should include the other response types, which is basically to store their status code, headers and the (encoded) body. However, StreamingResponse and FileResponse can't be cached in this project, full stop, because they represent dynamic content. These can be destinguished by their lack of a body attribute. We also can't support caching the background attribute, cached responses won't trigger new background tasks. The attribute should be retained on fresh responses however.

My current thinking is to process responses separately, replacing them with a custom CacheableResponse dataclass:

@dataclass(init=False)
class CacheableResponseWrapper:
    # str to facilitate clean JSON encoding support, decoded with UTF-8 + surrogate escapes
    str_body: str
    status_code: int
    # str to facilitate clean JSON encoding support, decoded as Latin-1.
    raw_str_headers: List[Tuple[str, str]]
    
    @classmethod
    def from_response(cls, resp: Response) -> Self:
        try:
            body = resp.body
        except AttributeError:
            raise TypeError(f"Unsupported dynamic Response type: {type(resp)}")
        headers = [(name.decode('latin1'), value.decode('latin1') for name, value in resp.raw_headers]
        return cls(body.decode('utf8', 'surrogateescape'), resp.status_code, headers)

    @property
    def response(self) -> Response:
        result = Response(self.body.encode('utf8', 'surrogateescape'), self.status_code)
        result.raw_headers = [(name.encode('latin1'), value.encode('latin1')) for name, value in self.raw_str_headers]
        return result
  • using a dataclass makes this both json encodable and pickaleable.
  • decoding the headers values to latin-1 means they can be stored as strings, important for the JsonCoder path, which would otherwise treat bytes values as UTF-8. Latin-1 is a codec that always succeeds and is reversable and is already the codec used by the Response implementation.
  • I've picked decoding the body as UTF-8 with the surrogateescape error handler because that would be the more efficient choice for the majority of response values, which I would expect to be text (JSON, HTML templates, etc). The surrogateescape handler allows you to 'smuggle' any byte sequence that is not UTF-8 into the resulting string as surrogate codepoints, which are codepoints not normally found in UTF-8 text (they are reserved for UTF-16 encodings), and these codepoints can be re-encoded to their original bytes by using the same error handler when encoding.

mjpieters avatar May 12 '23 15:05 mjpieters