Fix flaky `test_download_with_proxy_https_timeout()`
tests/test_downloader_handler_twisted_http11.py::TestHttp11Proxy::test_download_with_proxy_https_timeout fails with a OpenSSL.SSL.Error (wrapped into ResponseNeverReceived) quite often, including locally. I've spent some time debugging and couldn't find the reason. It's certainly not related to the timeout value, as increasing the timeout doesn't make it fail every time. It's almost certainly related to the comment in tests.mockserver.http_resources.UriResource.render() but that doesn't help understanding what actually happens, and understanding that will likely require extensive study of both Twisted and Scrapy proxy code.
Hi! I’d like to work on this issue for Hacktoberfest. Could you please assign it to me?” /assign
@pratyush2514 you don't need to be assigned to work on something.
Hi @wRAR Sir, I noticed my PR was closed with a spam label. I apologize if I did something wrong — I wasn’t trying to spam. My intention was to genuinely fix the flaky test as discussed in issue #7060. Could you please let me know what made it fall under spam so I can better understand and avoid making the same mistake in future contributions?
Thanks for your time.
Could you please let me know what made it fall under spam
You asked a coding agent to generate something resembling a fix without investigating and then sent the result as a PR.
Hi @wRAR Sir, I want to clarify that I did not use any coding agent; the PR was implemented entirely by me after investigating the flaky test. I’ll ensure future contributions clearly document my process to avoid any confusion. Thank you
after investigating the flaky test.
Unlikely.
Good luck!
@wRAR Sir I want to clarify again that this PR was fully implemented by me after investigating the flaky test. Could you please review it and consider removing the spam label? I’m happy to provide details of my approach if needed.
I’m happy to provide details of my approach if needed.
Go ahead.
@wRAR Thank you for giving me the opportunity to explain my approach. What I did was like: first, I ran the test locally several times and saw it randomly failing - sometimes with TimeoutError, sometimes with OpenSSL.SSL.Error wrapped in ResponseNeverReceived. That was weird. I looked at the test code and then found that comment you mentioned in UriResource.render about the "ugly hack for CONNECT request timeout test". That gave me a clue about what's happening. Basically, when the test tries to connect to the fake domain through the proxy, there's a race condition. Sometimes the timeout happens cleanly, sometimes it happens while trying to do the SSL handshake. Both are actually timeout scenarios, just happening at different moments.
My fix was like instead of trying to mess with the proxy/SSL code (which seemed way too complex), I just made the test accept both error types since they're both legitimate timeout failures. Added a simple check so it only validates the domain message for the original TimeoutError case.
Basically, when the test tries to connect to the fake domain through the proxy, there's a race condition. Sometimes the timeout happens cleanly, sometimes it happens while trying to do the SSL handshake. Both are actually timeout scenarios, just happening at different moments.
This is not what happens.
My fix was like instead of trying to mess with the proxy/SSL code (which seemed way too complex), I just made the test accept both error types since they're both legitimate timeout failures.
This is a wrong "fix".
That will be all.
@wRAR Sir I understand now that my analysis was incorrect and the fix was wrong. I clearly misunderstood what was actually causing the flaky behavior. However, I'd like to respectfully ask if the spam label could be reconsidered. While my technical approach was flawed, I did genuinely attempt to investigate and solve a real issue that was reported. I wasn't trying to submit low-effort or malicious content.
I'll make sure to do much deeper investigation before submitting future PRs to avoid wasting maintainers' time.
Thanks for the feedback - it's a good learning experience even though I got it wrong.
Hi @wRAR, is this issue still open? I’d like to work on resolving it.
@K-Preetham-Reddy yes
Thank you for the confirmation. I will update on the solution soon.
Greetings @wRAR, I came up with a solution. Would like to give a brief over it before further consideration
Root Cause
File: scrapy/tests/mockserver/http_resources.py
Problem: The UriResource.render() method's CONNECT handler was returning an empty response without implementing any tunnel functionality
This incomplete implementation meant:
No bidirectional data forwarding between client and destination SSL/TLS handshake packets were never forwarded to the destination server The connection would hang or fail unpredictably depending on timing
The intermittent nature occurred because the failure depended on race conditions between when Scrapy attempted the SSL handshake and when the incomplete proxy response was processed.
Solution Implemented a proper bidirectional HTTPS proxy tunnel using Twisted protocols:
Parses CONNECT destination from the request URI (host:port) Sends proper "200 Connection Established" response to client Establishes TCP connection to the actual destination server Forwards data bidirectionally using TunnelProtocol and TunnelFactory classes Handles connection failures gracefully by allowing Scrapy's timeout mechanism to work (instead of closing immediately) Buffers data when destination connection isn't ready yet
Changes Replaced empty CONNECT handler in tests/mockserver/http_resources.py with complete tunnel implementation Added TunnelProtocol class for bidirectional data forwarding Added TunnelFactory class for managing destination connections Proper cleanup on client/server disconnection
@K-Preetham-Reddy if you plan to ask your coding agent to implement the solution it came up with that you copypasted here, I recommend against it, to not waste our time. Thank you.
That is not what I did. I didn't copy paste from an AI agent. It took me an entire day to understand the code base and brief functionality of the codes. I did use AI to learn about the packages and technologies but didn't use it entirely to write the code.
I did work a lot on this issue and after many changes I got a proper solution. Can you checkout my PR and provide feedback
(two cases of https://x.com/IsaacKing314/status/1937009538844237829 so far)
Is this fixed? Cause i can't really reproduce this in my local setup, Or am i doing something wrong.
Is this fixed?
No.
i can't really reproduce this in my local setup
How many times did you run the test?
Is this fixed?
No.
i can't really reproduce this in my local setup
How many times did you run the test?
A couple of times ig? Do we need to run it in some script to bulk test it or something?
A couple of times ig?
🤷♂️
Do we need to run it in some script to bulk test it or something?
Whatever works for you. I just use something like while tox -e py -- -s tests/test_downloader_handler_twisted_http11.py::TestHttp11Proxy::test_download_with_proxy_https_timeout ; do :; done
I can debug and fix this flaky test that intermittently fails with OpenSSL.SSL.Error when testing HTTPS proxy timeout behavior. The issue is likely related to OpenSSL/Twisted interaction timing. I'll investigate the test setup, identify the race condition or timing issue, and implement a more robust test that consistently validates the timeout behavior.