skyvern
skyvern copied to clipboard
⚡️ Speed up function `validate_download_url` by 15% in PR #3867 (`stas/sdk-upload-files`)
⚡️ This pull request contains optimizations for PR #3867
If you approve this dependent PR, these changes will be merged into the original PR branch stas/sdk-upload-files.
This PR will be automatically closed if the original PR is merged.
📄 15% (0.15x) speedup for validate_download_url in skyvern/forge/sdk/api/files.py
⏱️ Runtime : 78.8 milliseconds → 68.6 milliseconds (best of 42 runs)
📝 Explanation and details
The optimization achieves a 14% speedup through three key improvements:
1. Pre-computed constants at module level:
- Moved
_ALLOWED_S3_PREFIX,_ALLOWED_FILE_PREFIX, and_LOCAL_ENVto module scope, eliminating repeated f-string formatting and attribute lookups on every function call - Line profiler shows S3 prefix check improved from 850.3ns to 472.5ns per hit (44% faster)
- File prefix check improved from 1281ns to 508.8ns per hit (60% faster)
2. Faster condition checking:
- Changed
scheme in ("http", "https")toscheme == "http" or scheme == "https"- direct equality comparisons are faster than tuple membership tests for small sets
3. Eliminated redundant URL parsing:
- Inlined the
parse_uri_to_pathlogic to reuse the already-parsedparsed_urlobject instead of callingurlparseagain - Line profiler shows this eliminated ~68.4ms of redundant parsing time (15.2% of original runtime)
Performance characteristics by test case:
- File URL validation shows the biggest gains (25-35% faster) due to eliminated double parsing
- S3 URL validation improves 4-11% from pre-computed prefix constants
- HTTP/HTTPS URLs see minimal impact since they hit the fast-path early
- Large-scale tests benefit most from the constant pre-computation, showing consistent 4-35% improvements across bulk operations
The optimizations are most effective for workloads with frequent file:// URL validation or high-volume S3 URL checking.
✅ Correctness verification report:
| Test | Status |
|---|---|
| ⏪ Replay Tests | 🔘 None Found |
| ⚙️ Existing Unit Tests | 🔘 None Found |
| 🔎 Concolic Coverage Tests | 🔘 None Found |
| 🌀 Generated Regression Tests | ✅ 15064 Passed |
| 📊 Tests Coverage | 71.4% |
🌀 Generated Regression Tests and Runtime
import sys
# Patch settings for tests
import types
# Function under test (copied from user)
from urllib.parse import unquote, urlparse
# imports
import pytest
from skyvern.forge.sdk.api.files import validate_download_url
# Mocks for settings and constants
class SettingsMock:
def __init__(self, env="local", bucket="test-bucket"):
self.ENV = env
self.AWS_S3_BUCKET_UPLOADS = bucket
REPO_ROOT_DIR = "/repo/root"
settings = SettingsMock()
from skyvern.forge.sdk.api.files import validate_download_url
# ----------- UNIT TESTS ------------
# Basic Test Cases
def test_http_url_valid():
# Should accept basic HTTP URL
codeflash_output = validate_download_url("http://example.com/file.txt") # 11.6μs -> 11.7μs (0.513% slower)
def test_https_url_valid():
# Should accept basic HTTPS URL
codeflash_output = validate_download_url("https://example.com/file.txt") # 11.5μs -> 11.6μs (0.517% slower)
def test_google_drive_https_url():
# Should accept Google Drive HTTPS URL
codeflash_output = validate_download_url("https://drive.google.com/uc?id=abc123") # 12.2μs -> 12.1μs (1.17% faster)
def test_s3_url_valid_uploads_bucket():
# Should accept valid S3 URL for uploads bucket
settings.ENV = "local"
settings.AWS_S3_BUCKET_UPLOADS = "test-bucket"
url = "s3://test-bucket/local/o_12345"
codeflash_output = validate_download_url(url) # 12.9μs -> 11.7μs (10.6% faster)
def test_file_url_valid_local_env():
# Should accept file:// URL in local env within allowed directory
settings.ENV = "local"
url = "file:///repo/root/downloads/file.txt"
codeflash_output = validate_download_url(url) # 16.0μs -> 12.5μs (28.2% faster)
# Edge Test Cases
def test_file_url_not_local_env():
# Should reject file:// URL in non-local env
settings.ENV = "prod"
url = "file:///repo/root/downloads/file.txt"
codeflash_output = validate_download_url(url) # 8.81μs -> 5.62μs (56.7% faster)
def test_file_url_outside_allowed_directory():
# Should reject file:// URL outside allowed directory in local env
settings.ENV = "local"
url = "file:///repo/root/otherdir/file.txt"
codeflash_output = validate_download_url(url) # 16.1μs -> 12.6μs (27.5% faster)
def test_file_url_with_encoded_path():
# Should accept file:// URL with encoded path inside allowed dir
settings.ENV = "local"
url = "file:///repo/root/downloads/file%20with%20spaces.txt"
codeflash_output = validate_download_url(url) # 22.7μs -> 19.2μs (18.2% faster)
def test_file_url_with_encoded_path_outside_allowed():
# Should reject file:// URL with encoded path outside allowed dir
settings.ENV = "local"
url = "file:///repo/root/other%20dir/file.txt"
codeflash_output = validate_download_url(url) # 21.7μs -> 18.3μs (18.5% faster)
def test_s3_url_wrong_bucket():
# Should reject S3 URL with wrong bucket
settings.ENV = "local"
settings.AWS_S3_BUCKET_UPLOADS = "test-bucket"
url = "s3://wrong-bucket/local/o_12345"
codeflash_output = validate_download_url(url) # 12.8μs -> 11.3μs (13.4% faster)
def test_s3_url_wrong_env():
# Should reject S3 URL with wrong env
settings.ENV = "prod"
settings.AWS_S3_BUCKET_UPLOADS = "test-bucket"
url = "s3://test-bucket/local/o_12345"
codeflash_output = validate_download_url(url) # 5.66μs -> 4.64μs (22.0% faster)
def test_s3_url_missing_o_prefix():
# Should reject S3 URL not starting with o_ after env
settings.ENV = "local"
settings.AWS_S3_BUCKET_UPLOADS = "test-bucket"
url = "s3://test-bucket/local/x_12345"
codeflash_output = validate_download_url(url) # 12.6μs -> 11.8μs (6.80% faster)
def test_unsupported_scheme_ftp():
# Should reject unsupported scheme (ftp)
codeflash_output = validate_download_url("ftp://example.com/file.txt") # 11.3μs -> 11.1μs (1.45% faster)
def test_unsupported_scheme_gopher():
# Should reject unsupported scheme (gopher)
codeflash_output = validate_download_url("gopher://example.com/file.txt") # 10.9μs -> 11.3μs (2.76% slower)
def test_empty_url():
# Should reject empty URL
codeflash_output = validate_download_url("") # 7.70μs -> 7.49μs (2.80% faster)
def test_malformed_url():
# Should reject malformed URL
codeflash_output = validate_download_url("not a url") # 7.71μs -> 7.79μs (0.912% slower)
def test_file_url_with_netloc():
# Should accept file:// with netloc if path is correct
settings.ENV = "local"
url = "file://repo/root/downloads/file.txt"
codeflash_output = validate_download_url(url) # 16.2μs -> 12.4μs (30.7% faster)
def test_file_url_with_netloc_outside_allowed():
# Should reject file:// with netloc outside allowed dir
settings.ENV = "local"
url = "file://repo/root/otherdir/file.txt"
codeflash_output = validate_download_url(url) # 15.8μs -> 12.2μs (30.0% faster)
def test_file_url_with_double_slash():
# Should accept file:// with double slash if path is correct
settings.ENV = "local"
url = "file:///repo/root/downloads//file.txt"
codeflash_output = validate_download_url(url) # 15.6μs -> 12.1μs (28.6% faster)
def test_file_url_with_tricky_path_traversal():
# Should reject file:// with path traversal outside allowed dir
settings.ENV = "local"
url = "file:///repo/root/downloads/../secrets/file.txt"
codeflash_output = validate_download_url(url) # 15.4μs -> 12.1μs (27.4% faster)
def test_file_url_with_relative_path():
# Should reject file:// with relative path
settings.ENV = "local"
url = "file://downloads/file.txt"
codeflash_output = validate_download_url(url) # 15.4μs -> 12.3μs (25.1% faster)
def test_file_url_with_windows_path():
# Should reject Windows-style file:// path
settings.ENV = "local"
url = "file:///C:/repo/root/downloads/file.txt"
codeflash_output = validate_download_url(url) # 15.3μs -> 11.9μs (28.5% faster)
def test_file_url_with_invalid_scheme():
# Should reject file:// with invalid scheme
settings.ENV = "local"
url = "files:///repo/root/downloads/file.txt"
codeflash_output = validate_download_url(url) # 11.1μs -> 11.1μs (0.271% faster)
# Large Scale Test Cases
def test_many_http_urls():
# Should accept a large number of HTTP URLs
for i in range(1000):
codeflash_output = validate_download_url(f"http://example.com/file_{i}.txt") # 4.29ms -> 4.28ms (0.251% faster)
def test_many_https_urls():
# Should accept a large number of HTTPS URLs
for i in range(1000):
codeflash_output = validate_download_url(f"https://example.com/file_{i}.txt") # 4.32ms -> 4.31ms (0.242% faster)
def test_many_s3_urls_valid():
# Should accept a large number of valid S3 URLs
settings.ENV = "local"
settings.AWS_S3_BUCKET_UPLOADS = "test-bucket"
for i in range(1000):
url = f"s3://test-bucket/local/o_{i:05d}"
codeflash_output = validate_download_url(url) # 4.60ms -> 4.42ms (4.01% faster)
def test_many_s3_urls_invalid():
# Should reject a large number of invalid S3 URLs
settings.ENV = "local"
settings.AWS_S3_BUCKET_UPLOADS = "test-bucket"
for i in range(1000):
url = f"s3://wrong-bucket/local/o_{i:05d}"
codeflash_output = validate_download_url(url) # 4.62ms -> 4.42ms (4.54% faster)
def test_many_file_urls_valid():
# Should accept a large number of valid file:// URLs in local env
settings.ENV = "local"
for i in range(1000):
url = f"file:///repo/root/downloads/file_{i}.txt"
codeflash_output = validate_download_url(url) # 6.40ms -> 4.77ms (34.0% faster)
def test_many_file_urls_invalid():
# Should reject a large number of invalid file:// URLs outside allowed dir
settings.ENV = "local"
for i in range(1000):
url = f"file:///repo/root/otherdir/file_{i}.txt"
codeflash_output = validate_download_url(url) # 6.38ms -> 4.75ms (34.4% faster)
def test_many_file_urls_nonlocal_env():
# Should reject all file:// URLs in non-local env
settings.ENV = "prod"
for i in range(1000):
url = f"file:///repo/root/downloads/file_{i}.txt"
codeflash_output = validate_download_url(url) # 6.40ms -> 4.78ms (34.0% faster)
def test_large_mixed_urls():
# Should correctly handle a large mixed batch of URLs
settings.ENV = "local"
settings.AWS_S3_BUCKET_UPLOADS = "test-bucket"
for i in range(500):
codeflash_output = validate_download_url(f"http://example.com/file_{i}.txt") # 2.31ms -> 2.30ms (0.214% faster)
codeflash_output = validate_download_url(f"https://example.com/file_{i}.txt")
codeflash_output = validate_download_url(f"s3://test-bucket/local/o_{i:05d}") # 2.28ms -> 2.28ms (0.249% faster)
codeflash_output = validate_download_url(f"s3://wrong-bucket/local/o_{i:05d}")
codeflash_output = validate_download_url(f"file:///repo/root/downloads/file_{i}.txt") # 2.48ms -> 2.34ms (5.71% faster)
codeflash_output = validate_download_url(f"file:///repo/root/otherdir/file_{i}.txt")
codeflash_output = validate_download_url(f"ftp://example.com/file_{i}.txt") # 2.43ms -> 2.30ms (5.72% faster)
codeflash_output = validate_download_url("")
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import sys
# Function to test (copied from above)
from urllib.parse import unquote, urlparse
# imports
import pytest
from skyvern.forge.sdk.api.files import validate_download_url
# Mocks for settings and constants
class Settings:
ENV = "local"
AWS_S3_BUCKET_UPLOADS = "mybucket"
settings = Settings()
REPO_ROOT_DIR = "/repo/root"
from skyvern.forge.sdk.api.files import validate_download_url
# --- UNIT TESTS ---
# BASIC TEST CASES
def test_valid_http_url():
# Should accept standard http URL
codeflash_output = validate_download_url("http://example.com/file.txt") # 12.1μs -> 12.1μs (0.580% faster)
def test_valid_https_url():
# Should accept standard https URL
codeflash_output = validate_download_url("https://example.com/file.txt") # 11.7μs -> 11.6μs (0.518% faster)
def test_valid_google_drive_https_url():
# Should accept Google Drive https URL
codeflash_output = validate_download_url("https://drive.google.com/uc?id=abc123") # 12.4μs -> 12.3μs (0.081% faster)
def test_valid_s3_url():
# Should accept valid s3 URL for the uploads bucket
url = f"s3://{settings.AWS_S3_BUCKET_UPLOADS}/{settings.ENV}/o_12345"
codeflash_output = validate_download_url(url) # 12.9μs -> 11.6μs (10.9% faster)
def test_invalid_s3_bucket_url():
# Should reject s3 URL for a different bucket
url = f"s3://otherbucket/{settings.ENV}/o_12345"
codeflash_output = validate_download_url(url) # 12.5μs -> 11.5μs (8.55% faster)
def test_invalid_s3_env_url():
# Should reject s3 URL for a different env
url = f"s3://{settings.AWS_S3_BUCKET_UPLOADS}/prod/o_12345"
codeflash_output = validate_download_url(url) # 12.4μs -> 11.4μs (8.85% faster)
def test_valid_file_url_local_env():
# Should accept valid file URL in local env
url = f"file://{REPO_ROOT_DIR}/downloads/myfile.txt"
codeflash_output = validate_download_url(url) # 16.1μs -> 12.3μs (30.2% faster)
def test_invalid_file_url_non_local_env(monkeypatch):
# Should reject file URL if ENV is not local
monkeypatch.setattr(settings, "ENV", "prod")
url = f"file://{REPO_ROOT_DIR}/downloads/myfile.txt"
codeflash_output = validate_download_url(url) # 8.99μs -> 5.84μs (53.8% faster)
monkeypatch.setattr(settings, "ENV", "local") # restore
def test_invalid_file_url_outside_downloads():
# Should reject file URL outside allowed downloads directory
url = f"file://{REPO_ROOT_DIR}/notdownloads/myfile.txt"
codeflash_output = validate_download_url(url) # 16.2μs -> 12.8μs (26.5% faster)
def test_invalid_scheme_ftp():
# Should reject unsupported ftp scheme
codeflash_output = validate_download_url("ftp://example.com/file.txt") # 11.0μs -> 11.2μs (1.52% slower)
def test_invalid_scheme_blank():
# Should reject blank scheme (relative URL)
codeflash_output = validate_download_url("example.com/file.txt") # 7.92μs -> 7.96μs (0.502% slower)
def test_invalid_scheme_data():
# Should reject dangerous data scheme
codeflash_output = validate_download_url("data:text/plain;base64,SGVsbG8sIFdvcmxkIQ==") # 9.44μs -> 9.21μs (2.51% faster)
# EDGE TEST CASES
def test_file_url_with_encoded_path():
# Should accept file URL with percent-encoded path inside downloads
url = f"file://{REPO_ROOT_DIR}/downloads/my%20file.txt"
codeflash_output = validate_download_url(url) # 22.2μs -> 18.7μs (18.7% faster)
def test_file_url_with_encoded_path_outside():
# Should reject file URL with percent-encoded path outside allowed dir
url = f"file://{REPO_ROOT_DIR}/notdownloads/my%20file.txt"
codeflash_output = validate_download_url(url) # 21.6μs -> 18.3μs (18.4% faster)
def test_file_url_with_tricky_path_traversal():
# Should reject file URL with path traversal outside allowed dir
url = f"file://{REPO_ROOT_DIR}/downloads/../secrets.txt"
codeflash_output = validate_download_url(url) # 15.4μs -> 12.3μs (24.6% faster)
def test_file_url_with_empty_path():
# Should reject file URL with empty path
url = "file://"
codeflash_output = validate_download_url(url) # 14.8μs -> 11.7μs (26.0% faster)
def test_file_url_with_only_slash():
# Should reject file URL with only slash
url = "file:///"
codeflash_output = validate_download_url(url) # 15.4μs -> 12.2μs (26.2% faster)
def test_file_url_with_double_slash():
# Should reject file URL with double slash
url = "file:////etc/passwd"
codeflash_output = validate_download_url(url) # 15.7μs -> 12.3μs (27.9% faster)
def test_file_url_with_netloc():
# Should accept file URL with netloc and path inside downloads
url = f"file://{REPO_ROOT_DIR}/downloads/abc.txt"
codeflash_output = validate_download_url(url) # 15.5μs -> 12.1μs (28.3% faster)
def test_file_url_with_netloc_outside():
# Should reject file URL with netloc and path outside downloads
url = f"file://{REPO_ROOT_DIR}/../secrets.txt"
codeflash_output = validate_download_url(url) # 15.6μs -> 12.0μs (29.5% faster)
def test_file_url_with_invalid_uri(monkeypatch):
# Should reject file URL with invalid URI
url = "file://%ZZ%ZZ"
codeflash_output = validate_download_url(url) # 23.1μs -> 19.2μs (20.4% faster)
def test_s3_url_with_missing_prefix():
# Should reject s3 URL missing the o_ prefix
url = f"s3://{settings.AWS_S3_BUCKET_UPLOADS}/{settings.ENV}/file.txt"
codeflash_output = validate_download_url(url) # 12.8μs -> 11.7μs (9.36% faster)
def test_http_url_with_uppercase_scheme():
# Should accept http URL with uppercase scheme
codeflash_output = validate_download_url("HTTP://example.com/file.txt") # 11.1μs -> 11.1μs (0.542% faster)
def test_s3_url_with_uppercase_scheme():
# Should accept s3 URL with uppercase scheme if valid
url = f"S3://{settings.AWS_S3_BUCKET_UPLOADS}/{settings.ENV}/o_123"
codeflash_output = validate_download_url(url) # 12.6μs -> 11.4μs (9.81% faster)
def test_file_url_with_uppercase_scheme():
# Should accept file URL with uppercase scheme if valid
url = f"FILE://{REPO_ROOT_DIR}/downloads/myfile.txt"
codeflash_output = validate_download_url(url) # 16.0μs -> 12.4μs (28.7% faster)
def test_file_url_with_trailing_slash():
# Should accept file URL with trailing slash inside downloads
url = f"file://{REPO_ROOT_DIR}/downloads/"
codeflash_output = validate_download_url(url) # 15.4μs -> 12.2μs (26.3% faster)
def test_file_url_with_dot_slash():
# Should accept file URL with ./ inside downloads
url = f"file://{REPO_ROOT_DIR}/downloads/./myfile.txt"
codeflash_output = validate_download_url(url) # 15.5μs -> 12.3μs (25.9% faster)
def test_file_url_with_multiple_slashes():
# Should accept file URL with multiple slashes inside downloads
url = f"file://{REPO_ROOT_DIR}/downloads///myfile.txt"
codeflash_output = validate_download_url(url) # 15.3μs -> 11.9μs (29.0% faster)
def test_file_url_with_windows_path():
# Should reject Windows-style file URL
url = "file://C:/repo/root/downloads/myfile.txt"
codeflash_output = validate_download_url(url) # 15.4μs -> 12.5μs (22.8% faster)
def test_http_url_with_query_and_fragment():
# Should accept http URL with query and fragment
url = "http://example.com/file.txt?foo=bar#section"
codeflash_output = validate_download_url(url) # 11.7μs -> 12.4μs (5.02% slower)
def test_file_url_with_fragment():
# Should accept file URL with fragment inside downloads
url = f"file://{REPO_ROOT_DIR}/downloads/myfile.txt#frag"
codeflash_output = validate_download_url(url) # 16.5μs -> 12.8μs (28.7% faster)
def test_file_url_with_query():
# Should accept file URL with query inside downloads
url = f"file://{REPO_ROOT_DIR}/downloads/myfile.txt?foo=bar"
codeflash_output = validate_download_url(url) # 16.2μs -> 12.7μs (27.7% faster)
def test_s3_url_with_query():
# Should accept valid s3 URL with query string
url = f"s3://{settings.AWS_S3_BUCKET_UPLOADS}/{settings.ENV}/o_12345?foo=bar"
codeflash_output = validate_download_url(url) # 13.3μs -> 12.2μs (9.04% faster)
def test_url_with_spaces():
# Should reject URL with spaces in scheme
url = "ht tp://example.com/file.txt"
codeflash_output = validate_download_url(url) # 8.67μs -> 8.72μs (0.574% slower)
def test_url_with_special_chars_in_path():
# Should accept http URL with special chars in path
url = "http://example.com/file@!{report_table}.txt"
codeflash_output = validate_download_url(url) # 11.1μs -> 11.1μs (0.362% faster)
def test_url_with_unicode_chars():
# Should accept http URL with unicode chars in path
url = "http://example.com/файл.txt"
codeflash_output = validate_download_url(url) # 12.2μs -> 12.0μs (1.84% faster)
def test_file_url_with_unicode_chars():
# Should accept file URL with unicode chars in path inside downloads
url = f"file://{REPO_ROOT_DIR}/downloads/файл.txt"
codeflash_output = validate_download_url(url) # 17.0μs -> 13.0μs (30.5% faster)
# LARGE SCALE TEST CASES
def test_many_valid_http_urls():
# Test a large number of valid http URLs
for i in range(500):
url = f"http://example.com/file_{i}.txt"
codeflash_output = validate_download_url(url) # 2.16ms -> 2.17ms (0.267% slower)
def test_many_valid_s3_urls():
# Test a large number of valid s3 URLs
for i in range(500):
url = f"s3://{settings.AWS_S3_BUCKET_UPLOADS}/{settings.ENV}/o_{i:05d}"
codeflash_output = validate_download_url(url) # 2.34ms -> 2.24ms (4.23% faster)
def test_many_invalid_s3_urls():
# Test a large number of invalid s3 URLs (wrong bucket)
for i in range(500):
url = f"s3://wrongbucket/{settings.ENV}/o_{i:05d}"
codeflash_output = validate_download_url(url) # 2.34ms -> 2.24ms (4.87% faster)
def test_many_valid_file_urls():
# Test a large number of valid file URLs inside downloads
for i in range(500):
url = f"file://{REPO_ROOT_DIR}/downloads/file_{i}.txt"
codeflash_output = validate_download_url(url) # 3.24ms -> 2.42ms (33.9% faster)
def test_many_invalid_file_urls():
# Test a large number of invalid file URLs outside downloads
for i in range(500):
url = f"file://{REPO_ROOT_DIR}/other/file_{i}.txt"
codeflash_output = validate_download_url(url) # 3.23ms -> 2.41ms (34.4% faster)
def test_large_http_url():
# Test a very long http URL
url = "http://example.com/" + "a"*900 + ".txt"
codeflash_output = validate_download_url(url) # 13.0μs -> 12.9μs (1.02% faster)
def test_large_file_url_inside_downloads():
# Test a very long file URL inside downloads
url = f"file://{REPO_ROOT_DIR}/downloads/" + "a"*900 + ".txt"
codeflash_output = validate_download_url(url) # 17.3μs -> 13.7μs (26.4% faster)
def test_large_file_url_outside_downloads():
# Test a very long file URL outside downloads
url = f"file://{REPO_ROOT_DIR}/other/" + "a"*900 + ".txt"
codeflash_output = validate_download_url(url) # 17.0μs -> 13.7μs (24.0% faster)
def test_large_s3_url():
# Test a very long s3 URL
url = f"s3://{settings.AWS_S3_BUCKET_UPLOADS}/{settings.ENV}/o_" + "x"*900
codeflash_output = validate_download_url(url) # 14.0μs -> 12.6μs (11.0% faster)
def test_large_number_of_mixed_urls():
# Test a mix of valid and invalid URLs
for i in range(250):
# Valid http
codeflash_output = validate_download_url(f"http://example.com/{i}.txt") # 1.13ms -> 1.15ms (1.67% slower)
# Invalid ftp
codeflash_output = validate_download_url(f"ftp://example.com/{i}.txt")
# Valid s3
codeflash_output = validate_download_url(f"s3://{settings.AWS_S3_BUCKET_UPLOADS}/{settings.ENV}/o_{i}") # 1.11ms -> 1.13ms (1.67% slower)
# Invalid s3
codeflash_output = validate_download_url(f"s3://wrongbucket/{settings.ENV}/o_{i}")
# Valid file
codeflash_output = validate_download_url(f"file://{REPO_ROOT_DIR}/downloads/{i}.txt") # 1.21ms -> 1.17ms (3.25% faster)
# Invalid file
codeflash_output = validate_download_url(f"file://{REPO_ROOT_DIR}/other/{i}.txt")
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
To edit these changes git checkout codeflash/optimize-pr3867-2025-11-04T03.49.27 and push.
⚡️ This PR optimizes the validate_download_url function by 15% through pre-computed constants, faster condition checking, and elimination of redundant URL parsing. The changes move expensive string formatting operations to module scope and inline path parsing logic to avoid duplicate urlparse calls.
🔍 Detailed Analysis
Key Changes
- Module-level constants: Pre-computed
_ALLOWED_S3_PREFIX,_ALLOWED_FILE_PREFIX, and_LOCAL_ENVto eliminate repeated f-string formatting and attribute lookups on every function call - Optimized condition checking: Changed
scheme in ("http", "https")to direct equality comparisonsscheme == "http" or scheme == "https"for better performance - Inlined path parsing: Eliminated redundant
urlparsecalls by directly using the already-parsedparsed_urlobject instead of callingparse_uri_to_path
Technical Implementation
flowchart TD
A[validate_download_url called] --> B[Parse URL once]
B --> C{Check scheme}
C -->|http/https| D[Return True - Fast path]
C -->|s3| E[Check against pre-computed S3 prefix]
C -->|file| F[Check pre-computed LOCAL_ENV flag]
F -->|False| G[Return False]
F -->|True| H[Inline path extraction using parsed_url]
H --> I[Check against pre-computed file prefix]
E --> J[Return validation result]
I --> J
style D fill:#90EE90
style G fill:#FFB6C1
style J fill:#87CEEB
Impact
- Performance improvement: 15% overall speedup with file URL validation showing 25-35% gains due to eliminated double parsing
- Memory efficiency: Pre-computed constants reduce repeated string operations and attribute lookups
- Scalability benefits: Large-scale operations benefit most from constant pre-computation, showing consistent 4-35% improvements across bulk operations
Created with Palmier
[!IMPORTANT] Optimizes
validate_download_urlinfiles.pyfor a 15% speedup by pre-computing constants, improving condition checks, and eliminating redundant URL parsing.
- Performance Optimization:
validate_download_urlinfiles.pyis optimized for a 15% speedup.- Pre-computed
_ALLOWED_S3_PREFIX,_ALLOWED_FILE_PREFIX, and_LOCAL_ENVat module level.- Changed
scheme in ("http", "https")toscheme == "http" or scheme == "https"for faster condition checking.- Inlined
parse_uri_to_pathlogic to reuseparsed_urlobject, eliminating redundanturlparsecalls.- Performance Impact:
- File URL validation is 25-35% faster.
- S3 URL validation is 4-11% faster.
- HTTP/HTTPS URLs see minimal impact.
- Large-scale tests show 4-35% improvements.
This description was created by
for e91500b53af8122b5c42297eb6ecf1851cf6c0cc. You can customize this summary. It will automatically update as commits are pushed.
[!IMPORTANT]
Review skipped
Bot user detected.
To trigger a single review, invoke the
@coderabbitai reviewcommand.You can disable this status message by setting the
reviews.review_statustofalsein the CodeRabbit configuration file.
Comment @coderabbitai help to get the list of available commands and usage tips.
This PR has been automatically closed because the original PR #3867 by stanislaw89 was closed.