skyvern icon indicating copy to clipboard operation
skyvern copied to clipboard

⚡️ Speed up function `validate_download_url` by 15% in PR #3867 (`stas/sdk-upload-files`)

Open codeflash-ai[bot] opened this issue 1 month ago • 1 comments

⚡️ This pull request contains optimizations for PR #3867

If you approve this dependent PR, these changes will be merged into the original PR branch stas/sdk-upload-files.

This PR will be automatically closed if the original PR is merged.


📄 15% (0.15x) speedup for validate_download_url in skyvern/forge/sdk/api/files.py

⏱️ Runtime : 78.8 milliseconds 68.6 milliseconds (best of 42 runs)

📝 Explanation and details

The optimization achieves a 14% speedup through three key improvements:

1. Pre-computed constants at module level:

  • Moved _ALLOWED_S3_PREFIX, _ALLOWED_FILE_PREFIX, and _LOCAL_ENV to module scope, eliminating repeated f-string formatting and attribute lookups on every function call
  • Line profiler shows S3 prefix check improved from 850.3ns to 472.5ns per hit (44% faster)
  • File prefix check improved from 1281ns to 508.8ns per hit (60% faster)

2. Faster condition checking:

  • Changed scheme in ("http", "https") to scheme == "http" or scheme == "https" - direct equality comparisons are faster than tuple membership tests for small sets

3. Eliminated redundant URL parsing:

  • Inlined the parse_uri_to_path logic to reuse the already-parsed parsed_url object instead of calling urlparse again
  • Line profiler shows this eliminated ~68.4ms of redundant parsing time (15.2% of original runtime)

Performance characteristics by test case:

  • File URL validation shows the biggest gains (25-35% faster) due to eliminated double parsing
  • S3 URL validation improves 4-11% from pre-computed prefix constants
  • HTTP/HTTPS URLs see minimal impact since they hit the fast-path early
  • Large-scale tests benefit most from the constant pre-computation, showing consistent 4-35% improvements across bulk operations

The optimizations are most effective for workloads with frequent file:// URL validation or high-volume S3 URL checking.

Correctness verification report:

Test Status
⏪ Replay Tests 🔘 None Found
⚙️ Existing Unit Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
🌀 Generated Regression Tests 15064 Passed
📊 Tests Coverage 71.4%
🌀 Generated Regression Tests and Runtime
import sys
# Patch settings for tests
import types
# Function under test (copied from user)
from urllib.parse import unquote, urlparse

# imports
import pytest
from skyvern.forge.sdk.api.files import validate_download_url


# Mocks for settings and constants
class SettingsMock:
    def __init__(self, env="local", bucket="test-bucket"):
        self.ENV = env
        self.AWS_S3_BUCKET_UPLOADS = bucket

REPO_ROOT_DIR = "/repo/root"

settings = SettingsMock()
from skyvern.forge.sdk.api.files import validate_download_url

# ----------- UNIT TESTS ------------

# Basic Test Cases

def test_http_url_valid():
    # Should accept basic HTTP URL
    codeflash_output = validate_download_url("http://example.com/file.txt") # 11.6μs -> 11.7μs (0.513% slower)

def test_https_url_valid():
    # Should accept basic HTTPS URL
    codeflash_output = validate_download_url("https://example.com/file.txt") # 11.5μs -> 11.6μs (0.517% slower)

def test_google_drive_https_url():
    # Should accept Google Drive HTTPS URL
    codeflash_output = validate_download_url("https://drive.google.com/uc?id=abc123") # 12.2μs -> 12.1μs (1.17% faster)

def test_s3_url_valid_uploads_bucket():
    # Should accept valid S3 URL for uploads bucket
    settings.ENV = "local"
    settings.AWS_S3_BUCKET_UPLOADS = "test-bucket"
    url = "s3://test-bucket/local/o_12345"
    codeflash_output = validate_download_url(url) # 12.9μs -> 11.7μs (10.6% faster)

def test_file_url_valid_local_env():
    # Should accept file:// URL in local env within allowed directory
    settings.ENV = "local"
    url = "file:///repo/root/downloads/file.txt"
    codeflash_output = validate_download_url(url) # 16.0μs -> 12.5μs (28.2% faster)

# Edge Test Cases

def test_file_url_not_local_env():
    # Should reject file:// URL in non-local env
    settings.ENV = "prod"
    url = "file:///repo/root/downloads/file.txt"
    codeflash_output = validate_download_url(url) # 8.81μs -> 5.62μs (56.7% faster)

def test_file_url_outside_allowed_directory():
    # Should reject file:// URL outside allowed directory in local env
    settings.ENV = "local"
    url = "file:///repo/root/otherdir/file.txt"
    codeflash_output = validate_download_url(url) # 16.1μs -> 12.6μs (27.5% faster)

def test_file_url_with_encoded_path():
    # Should accept file:// URL with encoded path inside allowed dir
    settings.ENV = "local"
    url = "file:///repo/root/downloads/file%20with%20spaces.txt"
    codeflash_output = validate_download_url(url) # 22.7μs -> 19.2μs (18.2% faster)

def test_file_url_with_encoded_path_outside_allowed():
    # Should reject file:// URL with encoded path outside allowed dir
    settings.ENV = "local"
    url = "file:///repo/root/other%20dir/file.txt"
    codeflash_output = validate_download_url(url) # 21.7μs -> 18.3μs (18.5% faster)

def test_s3_url_wrong_bucket():
    # Should reject S3 URL with wrong bucket
    settings.ENV = "local"
    settings.AWS_S3_BUCKET_UPLOADS = "test-bucket"
    url = "s3://wrong-bucket/local/o_12345"
    codeflash_output = validate_download_url(url) # 12.8μs -> 11.3μs (13.4% faster)

def test_s3_url_wrong_env():
    # Should reject S3 URL with wrong env
    settings.ENV = "prod"
    settings.AWS_S3_BUCKET_UPLOADS = "test-bucket"
    url = "s3://test-bucket/local/o_12345"
    codeflash_output = validate_download_url(url) # 5.66μs -> 4.64μs (22.0% faster)

def test_s3_url_missing_o_prefix():
    # Should reject S3 URL not starting with o_ after env
    settings.ENV = "local"
    settings.AWS_S3_BUCKET_UPLOADS = "test-bucket"
    url = "s3://test-bucket/local/x_12345"
    codeflash_output = validate_download_url(url) # 12.6μs -> 11.8μs (6.80% faster)

def test_unsupported_scheme_ftp():
    # Should reject unsupported scheme (ftp)
    codeflash_output = validate_download_url("ftp://example.com/file.txt") # 11.3μs -> 11.1μs (1.45% faster)

def test_unsupported_scheme_gopher():
    # Should reject unsupported scheme (gopher)
    codeflash_output = validate_download_url("gopher://example.com/file.txt") # 10.9μs -> 11.3μs (2.76% slower)

def test_empty_url():
    # Should reject empty URL
    codeflash_output = validate_download_url("") # 7.70μs -> 7.49μs (2.80% faster)

def test_malformed_url():
    # Should reject malformed URL
    codeflash_output = validate_download_url("not a url") # 7.71μs -> 7.79μs (0.912% slower)

def test_file_url_with_netloc():
    # Should accept file:// with netloc if path is correct
    settings.ENV = "local"
    url = "file://repo/root/downloads/file.txt"
    codeflash_output = validate_download_url(url) # 16.2μs -> 12.4μs (30.7% faster)

def test_file_url_with_netloc_outside_allowed():
    # Should reject file:// with netloc outside allowed dir
    settings.ENV = "local"
    url = "file://repo/root/otherdir/file.txt"
    codeflash_output = validate_download_url(url) # 15.8μs -> 12.2μs (30.0% faster)

def test_file_url_with_double_slash():
    # Should accept file:// with double slash if path is correct
    settings.ENV = "local"
    url = "file:///repo/root/downloads//file.txt"
    codeflash_output = validate_download_url(url) # 15.6μs -> 12.1μs (28.6% faster)

def test_file_url_with_tricky_path_traversal():
    # Should reject file:// with path traversal outside allowed dir
    settings.ENV = "local"
    url = "file:///repo/root/downloads/../secrets/file.txt"
    codeflash_output = validate_download_url(url) # 15.4μs -> 12.1μs (27.4% faster)

def test_file_url_with_relative_path():
    # Should reject file:// with relative path
    settings.ENV = "local"
    url = "file://downloads/file.txt"
    codeflash_output = validate_download_url(url) # 15.4μs -> 12.3μs (25.1% faster)

def test_file_url_with_windows_path():
    # Should reject Windows-style file:// path
    settings.ENV = "local"
    url = "file:///C:/repo/root/downloads/file.txt"
    codeflash_output = validate_download_url(url) # 15.3μs -> 11.9μs (28.5% faster)

def test_file_url_with_invalid_scheme():
    # Should reject file:// with invalid scheme
    settings.ENV = "local"
    url = "files:///repo/root/downloads/file.txt"
    codeflash_output = validate_download_url(url) # 11.1μs -> 11.1μs (0.271% faster)

# Large Scale Test Cases

def test_many_http_urls():
    # Should accept a large number of HTTP URLs
    for i in range(1000):
        codeflash_output = validate_download_url(f"http://example.com/file_{i}.txt") # 4.29ms -> 4.28ms (0.251% faster)

def test_many_https_urls():
    # Should accept a large number of HTTPS URLs
    for i in range(1000):
        codeflash_output = validate_download_url(f"https://example.com/file_{i}.txt") # 4.32ms -> 4.31ms (0.242% faster)

def test_many_s3_urls_valid():
    # Should accept a large number of valid S3 URLs
    settings.ENV = "local"
    settings.AWS_S3_BUCKET_UPLOADS = "test-bucket"
    for i in range(1000):
        url = f"s3://test-bucket/local/o_{i:05d}"
        codeflash_output = validate_download_url(url) # 4.60ms -> 4.42ms (4.01% faster)

def test_many_s3_urls_invalid():
    # Should reject a large number of invalid S3 URLs
    settings.ENV = "local"
    settings.AWS_S3_BUCKET_UPLOADS = "test-bucket"
    for i in range(1000):
        url = f"s3://wrong-bucket/local/o_{i:05d}"
        codeflash_output = validate_download_url(url) # 4.62ms -> 4.42ms (4.54% faster)

def test_many_file_urls_valid():
    # Should accept a large number of valid file:// URLs in local env
    settings.ENV = "local"
    for i in range(1000):
        url = f"file:///repo/root/downloads/file_{i}.txt"
        codeflash_output = validate_download_url(url) # 6.40ms -> 4.77ms (34.0% faster)

def test_many_file_urls_invalid():
    # Should reject a large number of invalid file:// URLs outside allowed dir
    settings.ENV = "local"
    for i in range(1000):
        url = f"file:///repo/root/otherdir/file_{i}.txt"
        codeflash_output = validate_download_url(url) # 6.38ms -> 4.75ms (34.4% faster)

def test_many_file_urls_nonlocal_env():
    # Should reject all file:// URLs in non-local env
    settings.ENV = "prod"
    for i in range(1000):
        url = f"file:///repo/root/downloads/file_{i}.txt"
        codeflash_output = validate_download_url(url) # 6.40ms -> 4.78ms (34.0% faster)

def test_large_mixed_urls():
    # Should correctly handle a large mixed batch of URLs
    settings.ENV = "local"
    settings.AWS_S3_BUCKET_UPLOADS = "test-bucket"
    for i in range(500):
        codeflash_output = validate_download_url(f"http://example.com/file_{i}.txt") # 2.31ms -> 2.30ms (0.214% faster)
        codeflash_output = validate_download_url(f"https://example.com/file_{i}.txt")
        codeflash_output = validate_download_url(f"s3://test-bucket/local/o_{i:05d}") # 2.28ms -> 2.28ms (0.249% faster)
        codeflash_output = validate_download_url(f"s3://wrong-bucket/local/o_{i:05d}")
        codeflash_output = validate_download_url(f"file:///repo/root/downloads/file_{i}.txt") # 2.48ms -> 2.34ms (5.71% faster)
        codeflash_output = validate_download_url(f"file:///repo/root/otherdir/file_{i}.txt")
        codeflash_output = validate_download_url(f"ftp://example.com/file_{i}.txt") # 2.43ms -> 2.30ms (5.72% faster)
        codeflash_output = validate_download_url("")
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import sys
# Function to test (copied from above)
from urllib.parse import unquote, urlparse

# imports
import pytest
from skyvern.forge.sdk.api.files import validate_download_url


# Mocks for settings and constants
class Settings:
    ENV = "local"
    AWS_S3_BUCKET_UPLOADS = "mybucket"

settings = Settings()
REPO_ROOT_DIR = "/repo/root"
from skyvern.forge.sdk.api.files import validate_download_url

# --- UNIT TESTS ---

# BASIC TEST CASES

def test_valid_http_url():
    # Should accept standard http URL
    codeflash_output = validate_download_url("http://example.com/file.txt") # 12.1μs -> 12.1μs (0.580% faster)

def test_valid_https_url():
    # Should accept standard https URL
    codeflash_output = validate_download_url("https://example.com/file.txt") # 11.7μs -> 11.6μs (0.518% faster)

def test_valid_google_drive_https_url():
    # Should accept Google Drive https URL
    codeflash_output = validate_download_url("https://drive.google.com/uc?id=abc123") # 12.4μs -> 12.3μs (0.081% faster)

def test_valid_s3_url():
    # Should accept valid s3 URL for the uploads bucket
    url = f"s3://{settings.AWS_S3_BUCKET_UPLOADS}/{settings.ENV}/o_12345"
    codeflash_output = validate_download_url(url) # 12.9μs -> 11.6μs (10.9% faster)

def test_invalid_s3_bucket_url():
    # Should reject s3 URL for a different bucket
    url = f"s3://otherbucket/{settings.ENV}/o_12345"
    codeflash_output = validate_download_url(url) # 12.5μs -> 11.5μs (8.55% faster)

def test_invalid_s3_env_url():
    # Should reject s3 URL for a different env
    url = f"s3://{settings.AWS_S3_BUCKET_UPLOADS}/prod/o_12345"
    codeflash_output = validate_download_url(url) # 12.4μs -> 11.4μs (8.85% faster)

def test_valid_file_url_local_env():
    # Should accept valid file URL in local env
    url = f"file://{REPO_ROOT_DIR}/downloads/myfile.txt"
    codeflash_output = validate_download_url(url) # 16.1μs -> 12.3μs (30.2% faster)

def test_invalid_file_url_non_local_env(monkeypatch):
    # Should reject file URL if ENV is not local
    monkeypatch.setattr(settings, "ENV", "prod")
    url = f"file://{REPO_ROOT_DIR}/downloads/myfile.txt"
    codeflash_output = validate_download_url(url) # 8.99μs -> 5.84μs (53.8% faster)
    monkeypatch.setattr(settings, "ENV", "local")  # restore

def test_invalid_file_url_outside_downloads():
    # Should reject file URL outside allowed downloads directory
    url = f"file://{REPO_ROOT_DIR}/notdownloads/myfile.txt"
    codeflash_output = validate_download_url(url) # 16.2μs -> 12.8μs (26.5% faster)

def test_invalid_scheme_ftp():
    # Should reject unsupported ftp scheme
    codeflash_output = validate_download_url("ftp://example.com/file.txt") # 11.0μs -> 11.2μs (1.52% slower)

def test_invalid_scheme_blank():
    # Should reject blank scheme (relative URL)
    codeflash_output = validate_download_url("example.com/file.txt") # 7.92μs -> 7.96μs (0.502% slower)

def test_invalid_scheme_data():
    # Should reject dangerous data scheme
    codeflash_output = validate_download_url("data:text/plain;base64,SGVsbG8sIFdvcmxkIQ==") # 9.44μs -> 9.21μs (2.51% faster)

# EDGE TEST CASES

def test_file_url_with_encoded_path():
    # Should accept file URL with percent-encoded path inside downloads
    url = f"file://{REPO_ROOT_DIR}/downloads/my%20file.txt"
    codeflash_output = validate_download_url(url) # 22.2μs -> 18.7μs (18.7% faster)

def test_file_url_with_encoded_path_outside():
    # Should reject file URL with percent-encoded path outside allowed dir
    url = f"file://{REPO_ROOT_DIR}/notdownloads/my%20file.txt"
    codeflash_output = validate_download_url(url) # 21.6μs -> 18.3μs (18.4% faster)

def test_file_url_with_tricky_path_traversal():
    # Should reject file URL with path traversal outside allowed dir
    url = f"file://{REPO_ROOT_DIR}/downloads/../secrets.txt"
    codeflash_output = validate_download_url(url) # 15.4μs -> 12.3μs (24.6% faster)

def test_file_url_with_empty_path():
    # Should reject file URL with empty path
    url = "file://"
    codeflash_output = validate_download_url(url) # 14.8μs -> 11.7μs (26.0% faster)

def test_file_url_with_only_slash():
    # Should reject file URL with only slash
    url = "file:///"
    codeflash_output = validate_download_url(url) # 15.4μs -> 12.2μs (26.2% faster)

def test_file_url_with_double_slash():
    # Should reject file URL with double slash
    url = "file:////etc/passwd"
    codeflash_output = validate_download_url(url) # 15.7μs -> 12.3μs (27.9% faster)

def test_file_url_with_netloc():
    # Should accept file URL with netloc and path inside downloads
    url = f"file://{REPO_ROOT_DIR}/downloads/abc.txt"
    codeflash_output = validate_download_url(url) # 15.5μs -> 12.1μs (28.3% faster)

def test_file_url_with_netloc_outside():
    # Should reject file URL with netloc and path outside downloads
    url = f"file://{REPO_ROOT_DIR}/../secrets.txt"
    codeflash_output = validate_download_url(url) # 15.6μs -> 12.0μs (29.5% faster)

def test_file_url_with_invalid_uri(monkeypatch):
    # Should reject file URL with invalid URI
    url = "file://%ZZ%ZZ"
    codeflash_output = validate_download_url(url) # 23.1μs -> 19.2μs (20.4% faster)

def test_s3_url_with_missing_prefix():
    # Should reject s3 URL missing the o_ prefix
    url = f"s3://{settings.AWS_S3_BUCKET_UPLOADS}/{settings.ENV}/file.txt"
    codeflash_output = validate_download_url(url) # 12.8μs -> 11.7μs (9.36% faster)

def test_http_url_with_uppercase_scheme():
    # Should accept http URL with uppercase scheme
    codeflash_output = validate_download_url("HTTP://example.com/file.txt") # 11.1μs -> 11.1μs (0.542% faster)

def test_s3_url_with_uppercase_scheme():
    # Should accept s3 URL with uppercase scheme if valid
    url = f"S3://{settings.AWS_S3_BUCKET_UPLOADS}/{settings.ENV}/o_123"
    codeflash_output = validate_download_url(url) # 12.6μs -> 11.4μs (9.81% faster)

def test_file_url_with_uppercase_scheme():
    # Should accept file URL with uppercase scheme if valid
    url = f"FILE://{REPO_ROOT_DIR}/downloads/myfile.txt"
    codeflash_output = validate_download_url(url) # 16.0μs -> 12.4μs (28.7% faster)

def test_file_url_with_trailing_slash():
    # Should accept file URL with trailing slash inside downloads
    url = f"file://{REPO_ROOT_DIR}/downloads/"
    codeflash_output = validate_download_url(url) # 15.4μs -> 12.2μs (26.3% faster)

def test_file_url_with_dot_slash():
    # Should accept file URL with ./ inside downloads
    url = f"file://{REPO_ROOT_DIR}/downloads/./myfile.txt"
    codeflash_output = validate_download_url(url) # 15.5μs -> 12.3μs (25.9% faster)

def test_file_url_with_multiple_slashes():
    # Should accept file URL with multiple slashes inside downloads
    url = f"file://{REPO_ROOT_DIR}/downloads///myfile.txt"
    codeflash_output = validate_download_url(url) # 15.3μs -> 11.9μs (29.0% faster)

def test_file_url_with_windows_path():
    # Should reject Windows-style file URL
    url = "file://C:/repo/root/downloads/myfile.txt"
    codeflash_output = validate_download_url(url) # 15.4μs -> 12.5μs (22.8% faster)

def test_http_url_with_query_and_fragment():
    # Should accept http URL with query and fragment
    url = "http://example.com/file.txt?foo=bar#section"
    codeflash_output = validate_download_url(url) # 11.7μs -> 12.4μs (5.02% slower)

def test_file_url_with_fragment():
    # Should accept file URL with fragment inside downloads
    url = f"file://{REPO_ROOT_DIR}/downloads/myfile.txt#frag"
    codeflash_output = validate_download_url(url) # 16.5μs -> 12.8μs (28.7% faster)

def test_file_url_with_query():
    # Should accept file URL with query inside downloads
    url = f"file://{REPO_ROOT_DIR}/downloads/myfile.txt?foo=bar"
    codeflash_output = validate_download_url(url) # 16.2μs -> 12.7μs (27.7% faster)

def test_s3_url_with_query():
    # Should accept valid s3 URL with query string
    url = f"s3://{settings.AWS_S3_BUCKET_UPLOADS}/{settings.ENV}/o_12345?foo=bar"
    codeflash_output = validate_download_url(url) # 13.3μs -> 12.2μs (9.04% faster)

def test_url_with_spaces():
    # Should reject URL with spaces in scheme
    url = "ht tp://example.com/file.txt"
    codeflash_output = validate_download_url(url) # 8.67μs -> 8.72μs (0.574% slower)

def test_url_with_special_chars_in_path():
    # Should accept http URL with special chars in path
    url = "http://example.com/file@!{report_table}.txt"
    codeflash_output = validate_download_url(url) # 11.1μs -> 11.1μs (0.362% faster)

def test_url_with_unicode_chars():
    # Should accept http URL with unicode chars in path
    url = "http://example.com/файл.txt"
    codeflash_output = validate_download_url(url) # 12.2μs -> 12.0μs (1.84% faster)

def test_file_url_with_unicode_chars():
    # Should accept file URL with unicode chars in path inside downloads
    url = f"file://{REPO_ROOT_DIR}/downloads/файл.txt"
    codeflash_output = validate_download_url(url) # 17.0μs -> 13.0μs (30.5% faster)

# LARGE SCALE TEST CASES

def test_many_valid_http_urls():
    # Test a large number of valid http URLs
    for i in range(500):
        url = f"http://example.com/file_{i}.txt"
        codeflash_output = validate_download_url(url) # 2.16ms -> 2.17ms (0.267% slower)

def test_many_valid_s3_urls():
    # Test a large number of valid s3 URLs
    for i in range(500):
        url = f"s3://{settings.AWS_S3_BUCKET_UPLOADS}/{settings.ENV}/o_{i:05d}"
        codeflash_output = validate_download_url(url) # 2.34ms -> 2.24ms (4.23% faster)

def test_many_invalid_s3_urls():
    # Test a large number of invalid s3 URLs (wrong bucket)
    for i in range(500):
        url = f"s3://wrongbucket/{settings.ENV}/o_{i:05d}"
        codeflash_output = validate_download_url(url) # 2.34ms -> 2.24ms (4.87% faster)

def test_many_valid_file_urls():
    # Test a large number of valid file URLs inside downloads
    for i in range(500):
        url = f"file://{REPO_ROOT_DIR}/downloads/file_{i}.txt"
        codeflash_output = validate_download_url(url) # 3.24ms -> 2.42ms (33.9% faster)

def test_many_invalid_file_urls():
    # Test a large number of invalid file URLs outside downloads
    for i in range(500):
        url = f"file://{REPO_ROOT_DIR}/other/file_{i}.txt"
        codeflash_output = validate_download_url(url) # 3.23ms -> 2.41ms (34.4% faster)

def test_large_http_url():
    # Test a very long http URL
    url = "http://example.com/" + "a"*900 + ".txt"
    codeflash_output = validate_download_url(url) # 13.0μs -> 12.9μs (1.02% faster)

def test_large_file_url_inside_downloads():
    # Test a very long file URL inside downloads
    url = f"file://{REPO_ROOT_DIR}/downloads/" + "a"*900 + ".txt"
    codeflash_output = validate_download_url(url) # 17.3μs -> 13.7μs (26.4% faster)

def test_large_file_url_outside_downloads():
    # Test a very long file URL outside downloads
    url = f"file://{REPO_ROOT_DIR}/other/" + "a"*900 + ".txt"
    codeflash_output = validate_download_url(url) # 17.0μs -> 13.7μs (24.0% faster)

def test_large_s3_url():
    # Test a very long s3 URL
    url = f"s3://{settings.AWS_S3_BUCKET_UPLOADS}/{settings.ENV}/o_" + "x"*900
    codeflash_output = validate_download_url(url) # 14.0μs -> 12.6μs (11.0% faster)

def test_large_number_of_mixed_urls():
    # Test a mix of valid and invalid URLs
    for i in range(250):
        # Valid http
        codeflash_output = validate_download_url(f"http://example.com/{i}.txt") # 1.13ms -> 1.15ms (1.67% slower)
        # Invalid ftp
        codeflash_output = validate_download_url(f"ftp://example.com/{i}.txt")
        # Valid s3
        codeflash_output = validate_download_url(f"s3://{settings.AWS_S3_BUCKET_UPLOADS}/{settings.ENV}/o_{i}") # 1.11ms -> 1.13ms (1.67% slower)
        # Invalid s3
        codeflash_output = validate_download_url(f"s3://wrongbucket/{settings.ENV}/o_{i}")
        # Valid file
        codeflash_output = validate_download_url(f"file://{REPO_ROOT_DIR}/downloads/{i}.txt") # 1.21ms -> 1.17ms (3.25% faster)
        # Invalid file
        codeflash_output = validate_download_url(f"file://{REPO_ROOT_DIR}/other/{i}.txt")
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr3867-2025-11-04T03.49.27 and push.

Codeflash Static Badge


⚡️ This PR optimizes the validate_download_url function by 15% through pre-computed constants, faster condition checking, and elimination of redundant URL parsing. The changes move expensive string formatting operations to module scope and inline path parsing logic to avoid duplicate urlparse calls.

🔍 Detailed Analysis

Key Changes

  • Module-level constants: Pre-computed _ALLOWED_S3_PREFIX, _ALLOWED_FILE_PREFIX, and _LOCAL_ENV to eliminate repeated f-string formatting and attribute lookups on every function call
  • Optimized condition checking: Changed scheme in ("http", "https") to direct equality comparisons scheme == "http" or scheme == "https" for better performance
  • Inlined path parsing: Eliminated redundant urlparse calls by directly using the already-parsed parsed_url object instead of calling parse_uri_to_path

Technical Implementation

flowchart TD
    A[validate_download_url called] --> B[Parse URL once]
    B --> C{Check scheme}
    C -->|http/https| D[Return True - Fast path]
    C -->|s3| E[Check against pre-computed S3 prefix]
    C -->|file| F[Check pre-computed LOCAL_ENV flag]
    F -->|False| G[Return False]
    F -->|True| H[Inline path extraction using parsed_url]
    H --> I[Check against pre-computed file prefix]
    E --> J[Return validation result]
    I --> J
    
    style D fill:#90EE90
    style G fill:#FFB6C1
    style J fill:#87CEEB

Impact

  • Performance improvement: 15% overall speedup with file URL validation showing 25-35% gains due to eliminated double parsing
  • Memory efficiency: Pre-computed constants reduce repeated string operations and attribute lookups
  • Scalability benefits: Large-scale operations benefit most from constant pre-computation, showing consistent 4-35% improvements across bulk operations

Created with Palmier


[!IMPORTANT] Optimizes validate_download_url in files.py for a 15% speedup by pre-computing constants, improving condition checks, and eliminating redundant URL parsing.

  • Performance Optimization:
    • validate_download_url in files.py is optimized for a 15% speedup.
    • Pre-computed _ALLOWED_S3_PREFIX, _ALLOWED_FILE_PREFIX, and _LOCAL_ENV at module level.
    • Changed scheme in ("http", "https") to scheme == "http" or scheme == "https" for faster condition checking.
    • Inlined parse_uri_to_path logic to reuse parsed_url object, eliminating redundant urlparse calls.
  • Performance Impact:
    • File URL validation is 25-35% faster.
    • S3 URL validation is 4-11% faster.
    • HTTP/HTTPS URLs see minimal impact.
    • Large-scale tests show 4-35% improvements.

This description was created by Ellipsis for e91500b53af8122b5c42297eb6ecf1851cf6c0cc. You can customize this summary. It will automatically update as commits are pushed.

codeflash-ai[bot] avatar Nov 04 '25 03:11 codeflash-ai[bot]

[!IMPORTANT]

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Nov 04 '25 03:11 coderabbitai[bot]

This PR has been automatically closed because the original PR #3867 by stanislaw89 was closed.

codeflash-ai[bot] avatar Nov 07 '25 22:11 codeflash-ai[bot]