cariddi
cariddi copied to clipboard
Proxy settings not honored
It seems that proxies are not honored, by looking at Wireshark traffic I see some requests not going through any proxy.
I think this is related to https://github.com/gocolly/colly/issues/392
We probably need to set
c.WithTransport(&http.Transport{
DisableKeepAlives: true,
})
in the code here
Hi @ocervell, thanks for the issue! Have you tried this solution? Because in the issue you linked it seems it worked just for one person. Moreover, disabling keep-alive connections will affect performance, so we shoud be sure it works fine
I am sure this is an issue, but i'm not sure yet about the solution. Indeed disable keep-alive might decrease performance, so we should not do it in non-proxy modes. imho when you pass proxy, you want to be sure that no traffic leaks outside the proxy.
Absolutely agree. When I'll have enough time to make some tests I'll take a deep look about that :)
As always, thanks for your help, really appreciated ❤️
@ocervell tentative for the fix here > https://github.com/edoardottt/cariddi/issues/143#issuecomment-2016560137
Partially fixed in version 1.3.3.
Many targets work fine with new proxy settings, however there could be problems in certain types of target. e.g.:
-
https://www.google.com/
works fine with a proxy -
https://edoardoottavianelli.it/
doesn't work with a proxy. I guess the problem lies within GitHub Pages hosting / Certificates / DNS Resolving.
Before this fix there was a clear problem with proxies, no target was working. After this fix a lot of targets can be crawled using a proxy.
If anyone has a better solution I'm all ears, just open an Issue / Pull Request