cariddi icon indicating copy to clipboard operation
cariddi copied to clipboard

Proxy settings not honored

Open ocervell opened this issue 1 year ago • 3 comments

It seems that proxies are not honored, by looking at Wireshark traffic I see some requests not going through any proxy.

I think this is related to https://github.com/gocolly/colly/issues/392

We probably need to set

c.WithTransport(&http.Transport{
  DisableKeepAlives: true,
})

in the code here

ocervell avatar May 09 '23 10:05 ocervell

Hi @ocervell, thanks for the issue! Have you tried this solution? Because in the issue you linked it seems it worked just for one person. Moreover, disabling keep-alive connections will affect performance, so we shoud be sure it works fine

edoardottt avatar May 09 '23 13:05 edoardottt

I am sure this is an issue, but i'm not sure yet about the solution. Indeed disable keep-alive might decrease performance, so we should not do it in non-proxy modes. imho when you pass proxy, you want to be sure that no traffic leaks outside the proxy.

ocervell avatar May 10 '23 14:05 ocervell

Absolutely agree. When I'll have enough time to make some tests I'll take a deep look about that :)

As always, thanks for your help, really appreciated ❤️

edoardottt avatar May 11 '23 09:05 edoardottt

@ocervell tentative for the fix here > https://github.com/edoardottt/cariddi/issues/143#issuecomment-2016560137

edoardottt avatar Mar 24 '24 10:03 edoardottt

Partially fixed in version 1.3.3.

Many targets work fine with new proxy settings, however there could be problems in certain types of target. e.g.:

  • https://www.google.com/ works fine with a proxy
  • https://edoardoottavianelli.it/ doesn't work with a proxy. I guess the problem lies within GitHub Pages hosting / Certificates / DNS Resolving.

Before this fix there was a clear problem with proxies, no target was working. After this fix a lot of targets can be crawled using a proxy.

If anyone has a better solution I'm all ears, just open an Issue / Pull Request

edoardottt avatar Apr 01 '24 09:04 edoardottt