codeql icon indicating copy to clipboard operation
codeql copied to clipboard

Unable to resolve CodeQL SSRF warning for a HTTP request function that takes pip package names as input

Open tieneupin opened this issue 1 year ago • 4 comments

I'm working on a function that returns a HTTP response from https://pypi.org/simple/ when Python's pip installer requests it for a package. When pushing my code onto GitHub, the CodeQL checks warn of the risk of server side request forgery (SSRF), and asks me to create validation checks for the "user-defined input" (which is pip, in this case).

I have already made many attempts at validating the URL to satisfy this SSRF warning, but GitHub CodeQL has not accepted any of them so far. How can I rewrite the following to satisfy GitHub CodeQL's requirements for guarding against SSRF?

The relevant block of code:

import requests
from fastapi import APIRouter, Response

pypi = APIRouter(prefix="/pypi", tags=["bootstrap"])

@pypi.get("/{package}/", response_class=Response)
def get_pypi_package_downloads_list(package: str) -> Response:
    """
    Obtain list of all package downloads from PyPI via the simple API (PEP 503).
    """
    url = f"https://pypi.org/simple/{package}"
    full_path_response = requests.get(url)

The following is a non-exhaustive overview of attempts I've tried in order to satisfy that SSRF warning. However, none of them have worked for me.

# Attempt 1
# Check that it's a PyPI URL
url = f"https://pypi.org/simple/{package}"
if "pypi" in url:
    full_path_response = requests.get(url)
else:
    raise ValueError("This is not a valid package")


# Attempt 2
# Validate that package name is alphanumeric (allow _ and -)
if package.replace("_", "").replace("-", "").isalnum():  
    url = f"https://pypi.org/simple/{package}"
    full_path_response = requests.get(url)
else:
    raise ValueError("This is not a valid package")


# Attempt 3
# Check that it's a valid connection
with requests.get("https://pypi.org/simple/{package}") as http_response:
    if http_response.status_code == 200:
        full_path_response = http_response
    else:
        raise ValueError("This is not a valid package")


# Attempt 4
# Tried using RegEx matching to validate package name
if re.match(r"^[a-z0-9\_\-]+$", package):
    full_path_response = requests.get(f"https://pypi.org/simple/{package}")
else:
    raise ValueError("This is not a valid package")


# Attempt 5
# Use urllib.parse.urlparse to parse and validate the url
def validate_url(url: str) -> bool:
    parsed_url = urlparse(url)
    if parsed_url.scheme == "https" and parsed_url.hostname == "pypi.org":
        return True
    else:
        return False

def validate_package(package: str) -> bool:
    if package.replace("_", "").replace("-", "").isalnum():
        return True
    else:
        return False

# Validate package and URL
if validate_package(package) and validate_url(f"https://pypi.org/simple/{package}"):
    full_path_response = requests.get(
        f"https://pypi.org/simple/{package}"
    )  # Get response from PyPI
else:
    raise ValueError("This is not a valid package")


# Attempt 6
# Using a Pydantic model
from pydantic import BaseModel, HttpUrl, ValidationError

class UrlValidator(BaseModel):
    url: HttpUrl

def validate(url: str):
    try:
        UrlValidator(url=url)
    except ValidationError:
        log.error(f"{url} was not a valid URL")
        return False
    else:
        log.info(f"{url} was a valid URL")
        return True

# Attempt at URL validation to satisfy GitHub CodeQL requirements
url = f"https://pypi.org/simple/{package}"
if validate(url):
    full_path_response = requests.get(url)


# Attempt 7
# Encoding string before injection
from urllib.parse import quote_plus

def _validate_package_name(package: str) -> bool:
    # Check that it only contains alphanumerics, "_", or "-", and isn't excessively long
    if re.match(r"^[a-z0-9\-\_]+$", package):
        return True
    else:
        return False

def _get_full_path_response(package: str) -> requests.Response:
    # Sanitise string
    package_clean = quote_plus(package)
    print(f"Cleaned package: {package_clean}")

    # Validation checks
    if _validate_package_name(package_clean):
        url = f"https://pypi.org/simple/{package_clean}"
        print(f"URL: {url}")
        return requests.get(url)
    else:
        raise ValueError(f"{package_clean} is not a valid package name")
full_path_response = _get_full_path_response(package)


# Attempt 8
# The nuclear option of maintaining a list of approved packages
approved_packages: list = [pkg.lower() for pkg in approved_packages]  # List of package names from running `conda env list`

# Validate package and URL
if package.lower() in approved_packages:
    url = f"https://pypi.org/simple/{package}"
    full_path_response = requests.get(url)
else:
    raise ValueError(f"{package} is not a valid package name")

Thanks!

tieneupin avatar May 22 '24 14:05 tieneupin

Hi @tieneupin,

Which query is giving the alert? py/full-ssrf?

jketema avatar May 22 '24 14:05 jketema

Hi @jketema , thanks for getting back to me so quickly!

I'm persistently getting the py/partial-SSRF alert, which, if I understand correctly, is the possible manipulation of a path from a fixed domain. I've tried mimicking the "good" examples provided in the explanations page, to no avail.

Good examples from the alert description:

import requests
from flask import Flask, request

app = Flask(__name__)

@app.route("/partial_ssrf")
def partial_ssrf():
    user_id = request.args["user_id"]

    # BAD: user can fully control the path component of the URL
    resp = requests.get("https://api.example.com/user_info/" + user_id)

    if user_id.isalnum():
        # GOOD: user_id is restricted to be alpha-numeric, and cannot alter path component of URL
        resp = requests.get("https://api.example.com/user_info/" + user_id)

tieneupin avatar May 22 '24 15:05 tieneupin

👋 Hey @tieneupin! I'm going to move this issue to github/codeql, since that is more suited for discussions around potential false positives or false negatives from CodeQL queries.

sidshank avatar May 23 '24 13:05 sidshank

Hey, thanks for moving it over!

I am quite new to software development, so could you advise on whether the attempts I've made are sufficient in actual practice to nullify this threat? If yes, this would mean that the py/partial-ssrf check is malformed, and would need tweaking.

I was asking around on StackExchange since the start of this week, and one of the users commented that "CodeQL doesn't currently seem to recognize URL-encoding as a sanitizer" (link). Could you comment on this?

tieneupin avatar May 23 '24 15:05 tieneupin

@tieneupin Apologies for the delayed response! This is indeed a false positive, and in the process you have identified an opportunity for us to improve what this query flags as a Partial SSRF result. We will look into this. Meanwhile, please consider ignoring this alert by dismissing it.

sidshank avatar May 29 '24 02:05 sidshank

Brilliant, thank you for confirming that it is indeed a false positive. I'll be able to make progress with that section of the code now. :smile:

I'll leave it to your team to close this issue/leave it open as you see fit, seeing as this is something that needs to be updated in CodeQL itself?

tieneupin avatar May 29 '24 08:05 tieneupin

I'll leave this issue open. We're also tracking this internally on another issue. We'll report back here (and close the issue) once we have a fix.

sidshank avatar May 29 '24 14:05 sidshank

@tieneupin The pull request that resolves this issue has been merged. The changes should become available as part of v2.18.0 that is expected to release around July 12th. At that point, validating your input using mechanisms like isalnum (as stated in the query help) or regex matches should help you avoid this alert. I will close this issue, but please reopen it if you adopt the new CodeQL version and still see unexpected behaviour.

sidshank avatar Jun 12 '24 18:06 sidshank

@sidshank , many thanks for informing me about the upcoming fix to this reported issue! Yes, I'll let you know if I stumble across it again in the future. 😄

tieneupin avatar Jun 13 '24 01:06 tieneupin