sanic icon indicating copy to clipboard operation
sanic copied to clipboard

Connection closed before full header was received

Open yoniker opened this issue 2 years ago • 11 comments

Describe the bug When migrating my app from Flask to Sanic,I encountered an issue when redirecting to a presigned s3 link ("Connection closed before full header was received") which does not occur with Flask (client is written with Flutter).

Code snippet

The flask code and sanic code for that endpoint are quite similar:

Flask:

@app.route('/dummy/<path:aws_key>')
def redirect_to_aws_dummy(aws_key):
   url = boto3.client('s3').generate_presigned_url(
      ClientMethod='get_object',
      Params={'Bucket': 'com.bucketter.dummy', 'Key': aws_key},
      ExpiresIn=60)
   return redirect(url, code=302)

Sanic:

executor_aws_presigned = concurrent.futures.ThreadPoolExecutor(max_workers=32)
def generate_dummy_users_presigned_url(aws_key):
    url = boto3.client('s3').generate_presigned_url(
        ClientMethod='get_object',
        Params={'Bucket': ''com.bucketter.dummy', 'Key': aws_key},
        ExpiresIn=60)
    return url

@app.route('/dummy/<aws_key:path>')
async def redirect_to_aws_dummy(request,aws_key):
    url = await request.app.loop.run_in_executor(executor_aws_presigned, generate_dummy_users_presigned_url, aws_key)
    return sanic.response.redirect(url, status=302)

Expected behavior show dozens of images concurrently without an issue (same as flask)

Environment (please complete the following information):

  • OS: On the client this happens with Android API 25-30 on emulators, as well as physical devices. On iOS I've verified that on iPhone12 with iOS 15.3.1 Sanic version: Sanic v21.12.1

yoniker avatar Mar 13 '22 10:03 yoniker

Why this: run_in_executor? Mixing threads and asyncio can be painful and is generally not recommended. Does it work fine while running it directly?

ahopkins avatar Mar 13 '22 11:03 ahopkins

Here is a simple example to mock what you are doing. It seems to work fine for me:

import concurrent
import time

from sanic import Request, Sanic, json

executor = concurrent.futures.ThreadPoolExecutor(max_workers=32)
app = Sanic(__name__)


def slow(request: Request, n=3):
    for i in range(n):
        print(f"[{request.id}] {i} of {n}")
        time.sleep(1)
    return {"foo": "bar"}


@app.get("/")
async def handler(request: Request):
    data = await request.app.loop.run_in_executor(executor, slow, request)
    return json(data)

ahopkins avatar Mar 13 '22 11:03 ahopkins

Why this: run_in_executor? Mixing threads and asyncio can be painful and is generally not recommended. Does it work fine while running it directly?

The method generate_dummy_users_presigned_url takes some time (0.1 seconds) and I didn't want the entire sanic worker to be occupied by it (I thought that's the idea when it comes to async), and therefore I used this executor to transfer this into a Future. Is this approach wrong? If so what are the alternatives?

This error doesn't happen "all the time", eg once every a few dozens of images. I will post youtube screenshots of how the client works with Flask vs Sanic so you can see the difference.

yoniker avatar Mar 13 '22 11:03 yoniker

Is this approach wrong? If so what are the alternatives?

Not necessarily wrong, it just becomes a bit more tricky. Have you considered using this instead: https://github.com/aio-libs/aiobotocore?

This error doesn't happen "all the time", eg once every a few dozens of images.

Sounds like some sort of a race condition. Have you tried running a simple script to test out boto3 in a thread pool using asyncio?

ahopkins avatar Mar 13 '22 11:03 ahopkins

Is this approach wrong? If so what are the alternatives?

Not necessarily wrong, it just becomes a bit more tricky. Have you considered using this instead: https://github.com/aio-libs/aiobotocore?

This error doesn't happen "all the time", eg once every a few dozens of images.

Sounds like some sort of a race condition. Have you tried running a simple script to test out boto3 in a thread pool using asyncio?

I feel like a synthetic mock example might not replicate the problem I'm facing correctly.

The client code with a flask backend: https://youtu.be/cDbpYCXVG9E Same but with Sanic yields those weird errors: https://youtu.be/gelFV_5YaRw

Assuming it is a race condition, how would I verify it? Why would this code cause a race condition? I've tried to create a different AWS session within each thread/handler eg simply:


def generate_dummy_users_presigned_url(aws_key):
    
    
    session = boto3.session.Session()
    
    s3Client = session.client('s3')
    return s3Client.generate_presigned_url(
        ClientMethod='get_object',
        Params={'Bucket': app.config.dummy_users_bucket_name, 'Key': aws_key},
        ExpiresIn=60)

but the same issue occurs for this code, not sure how a race condition is possible if each thread has its own AWS session.

I could try the aio-libs solution and let you know - just want to minimize dependencies on repos not officially backed by the provider (AWS) and therefore that's not my preferred solution.

yoniker avatar Mar 13 '22 12:03 yoniker

I think the race condition is with the connections coming out of order. Let me see if I can replicate it without the AWS calls.

ahopkins avatar Mar 13 '22 12:03 ahopkins

I think the race condition is with the connections coming out of order. Let me see if I can replicate it without the AWS calls.

Okay, since it may be tricky to replicate(it's unfortunate that I can't provide a meaningful minimal code without my AWS credentials), I don't really care if you will control my development server (via Teamviewer for example). Or if I can verify that that's the issue from my end or make things easier for you let me know.

yoniker avatar Mar 13 '22 12:03 yoniker

It would be helpful to know:

  • What the Sanic logs are when this happens
  • What the actual HTTP rersponse is
  • Whether the client is using a single or multiple HTTP connections

ahopkins avatar Mar 13 '22 13:03 ahopkins

@yoniker I cannot do it now, but maybe ping me on the Sanic discord server and we can walk thru the issue there in RT?

ahopkins avatar Mar 13 '22 13:03 ahopkins

Running sanic at debug mode, the output of the window from which i run the app are the same (except for some "[2022-03-13 16:08:24 +0200] [499349] [DEBUG] KeepAlive Timeout. Closing connection." outputs which seem to be not related eg happening without that error[although still noteworthy]). Interestingly, no output for the images for which the error occurs (I've tried

logger.info(msg = f'called with {aws_key}')

but that line never logged + no error message = impossible for me to debug without diving deep into sanic).

Not sure what's the actual https response, eg the client throws the exception here: https://github.com/flutter/flutter/blob/2f73173c36f806c7f30e705a90722082f85a2c8b/packages/flutter/lib/src/painting/_network_image_io.dart#L120 and the exception is HttpException: Connection closed before full header was received, uri = https://dordating.com:8087/dummy/159428700/159428700.6.jpg

The client is using multiple HTTP connections(each different url grabs a different image RESTfully).

yoniker avatar Mar 13 '22 15:03 yoniker

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is incorrect, please respond with an update. Thank you for your contributions.

stale[bot] avatar Jun 12 '22 21:06 stale[bot]