scrapy icon indicating copy to clipboard operation
scrapy copied to clipboard

Fix flaky `test_download_with_proxy_https_timeout()`

Open wRAR opened this issue 3 months ago • 24 comments

tests/test_downloader_handler_twisted_http11.py::TestHttp11Proxy::test_download_with_proxy_https_timeout fails with a OpenSSL.SSL.Error (wrapped into ResponseNeverReceived) quite often, including locally. I've spent some time debugging and couldn't find the reason. It's certainly not related to the timeout value, as increasing the timeout doesn't make it fail every time. It's almost certainly related to the comment in tests.mockserver.http_resources.UriResource.render() but that doesn't help understanding what actually happens, and understanding that will likely require extensive study of both Twisted and Scrapy proxy code.

wRAR avatar Sep 28 '25 12:09 wRAR

Hi! I’d like to work on this issue for Hacktoberfest. Could you please assign it to me?” /assign

pratyush2514 avatar Sep 29 '25 15:09 pratyush2514

@pratyush2514 you don't need to be assigned to work on something.

wRAR avatar Sep 29 '25 16:09 wRAR

Hi @wRAR Sir, I noticed my PR was closed with a spam label. I apologize if I did something wrong — I wasn’t trying to spam. My intention was to genuinely fix the flaky test as discussed in issue #7060. Could you please let me know what made it fall under spam so I can better understand and avoid making the same mistake in future contributions?

Thanks for your time.

Aashish-Jha-11 avatar Oct 01 '25 09:10 Aashish-Jha-11

Could you please let me know what made it fall under spam

You asked a coding agent to generate something resembling a fix without investigating and then sent the result as a PR.

wRAR avatar Oct 01 '25 09:10 wRAR

Hi @wRAR Sir, I want to clarify that I did not use any coding agent; the PR was implemented entirely by me after investigating the flaky test. I’ll ensure future contributions clearly document my process to avoid any confusion. Thank you

Aashish-Jha-11 avatar Oct 01 '25 10:10 Aashish-Jha-11

after investigating the flaky test.

Unlikely.

Good luck!

wRAR avatar Oct 01 '25 10:10 wRAR

@wRAR Sir I want to clarify again that this PR was fully implemented by me after investigating the flaky test. Could you please review it and consider removing the spam label? I’m happy to provide details of my approach if needed.

Aashish-Jha-11 avatar Oct 01 '25 10:10 Aashish-Jha-11

I’m happy to provide details of my approach if needed.

Go ahead.

wRAR avatar Oct 01 '25 10:10 wRAR

@wRAR Thank you for giving me the opportunity to explain my approach. What I did was like: first, I ran the test locally several times and saw it randomly failing - sometimes with TimeoutError, sometimes with OpenSSL.SSL.Error wrapped in ResponseNeverReceived. That was weird. I looked at the test code and then found that comment you mentioned in UriResource.render about the "ugly hack for CONNECT request timeout test". That gave me a clue about what's happening. Basically, when the test tries to connect to the fake domain through the proxy, there's a race condition. Sometimes the timeout happens cleanly, sometimes it happens while trying to do the SSL handshake. Both are actually timeout scenarios, just happening at different moments.

My fix was like instead of trying to mess with the proxy/SSL code (which seemed way too complex), I just made the test accept both error types since they're both legitimate timeout failures. Added a simple check so it only validates the domain message for the original TimeoutError case.

Aashish-Jha-11 avatar Oct 01 '25 10:10 Aashish-Jha-11

Basically, when the test tries to connect to the fake domain through the proxy, there's a race condition. Sometimes the timeout happens cleanly, sometimes it happens while trying to do the SSL handshake. Both are actually timeout scenarios, just happening at different moments.

This is not what happens.

My fix was like instead of trying to mess with the proxy/SSL code (which seemed way too complex), I just made the test accept both error types since they're both legitimate timeout failures.

This is a wrong "fix".

That will be all.

wRAR avatar Oct 01 '25 10:10 wRAR

@wRAR Sir I understand now that my analysis was incorrect and the fix was wrong. I clearly misunderstood what was actually causing the flaky behavior. However, I'd like to respectfully ask if the spam label could be reconsidered. While my technical approach was flawed, I did genuinely attempt to investigate and solve a real issue that was reported. I wasn't trying to submit low-effort or malicious content.

I'll make sure to do much deeper investigation before submitting future PRs to avoid wasting maintainers' time.

Thanks for the feedback - it's a good learning experience even though I got it wrong.

Aashish-Jha-11 avatar Oct 01 '25 10:10 Aashish-Jha-11

Hi @wRAR, is this issue still open? I’d like to work on resolving it.

K-Preetham-Reddy avatar Oct 02 '25 04:10 K-Preetham-Reddy

@K-Preetham-Reddy yes

wRAR avatar Oct 02 '25 06:10 wRAR

Thank you for the confirmation. I will update on the solution soon.

K-Preetham-Reddy avatar Oct 02 '25 07:10 K-Preetham-Reddy

Greetings @wRAR, I came up with a solution. Would like to give a brief over it before further consideration

Root Cause

File: scrapy/tests/mockserver/http_resources.py

Problem: The UriResource.render() method's CONNECT handler was returning an empty response without implementing any tunnel functionality

This incomplete implementation meant:

No bidirectional data forwarding between client and destination SSL/TLS handshake packets were never forwarded to the destination server The connection would hang or fail unpredictably depending on timing

The intermittent nature occurred because the failure depended on race conditions between when Scrapy attempted the SSL handshake and when the incomplete proxy response was processed.

Solution Implemented a proper bidirectional HTTPS proxy tunnel using Twisted protocols:

Parses CONNECT destination from the request URI (host:port) Sends proper "200 Connection Established" response to client Establishes TCP connection to the actual destination server Forwards data bidirectionally using TunnelProtocol and TunnelFactory classes Handles connection failures gracefully by allowing Scrapy's timeout mechanism to work (instead of closing immediately) Buffers data when destination connection isn't ready yet

Changes Replaced empty CONNECT handler in tests/mockserver/http_resources.py with complete tunnel implementation Added TunnelProtocol class for bidirectional data forwarding Added TunnelFactory class for managing destination connections Proper cleanup on client/server disconnection

K-Preetham-Reddy avatar Oct 03 '25 05:10 K-Preetham-Reddy

@K-Preetham-Reddy if you plan to ask your coding agent to implement the solution it came up with that you copypasted here, I recommend against it, to not waste our time. Thank you.

wRAR avatar Oct 03 '25 05:10 wRAR

That is not what I did. I didn't copy paste from an AI agent. It took me an entire day to understand the code base and brief functionality of the codes. I did use AI to learn about the packages and technologies but didn't use it entirely to write the code.

K-Preetham-Reddy avatar Oct 03 '25 06:10 K-Preetham-Reddy

I did work a lot on this issue and after many changes I got a proper solution. Can you checkout my PR and provide feedback

K-Preetham-Reddy avatar Oct 03 '25 06:10 K-Preetham-Reddy

(two cases of https://x.com/IsaacKing314/status/1937009538844237829 so far)

wRAR avatar Oct 03 '25 06:10 wRAR

Is this fixed? Cause i can't really reproduce this in my local setup, Or am i doing something wrong.

Exgene avatar Oct 10 '25 11:10 Exgene

Is this fixed?

No.

i can't really reproduce this in my local setup

How many times did you run the test?

wRAR avatar Oct 10 '25 15:10 wRAR

Is this fixed?

No.

i can't really reproduce this in my local setup

How many times did you run the test?

A couple of times ig? Do we need to run it in some script to bulk test it or something?

Exgene avatar Oct 10 '25 16:10 Exgene

A couple of times ig?

🤷‍♂️

Do we need to run it in some script to bulk test it or something?

Whatever works for you. I just use something like while tox -e py -- -s tests/test_downloader_handler_twisted_http11.py::TestHttp11Proxy::test_download_with_proxy_https_timeout ; do :; done

wRAR avatar Oct 11 '25 02:10 wRAR

I can debug and fix this flaky test that intermittently fails with OpenSSL.SSL.Error when testing HTTPS proxy timeout behavior. The issue is likely related to OpenSSL/Twisted interaction timing. I'll investigate the test setup, identify the race condition or timing issue, and implement a more robust test that consistently validates the timeout behavior.

dohrisalim avatar Nov 09 '25 09:11 dohrisalim