out_s3: blob: fix handling of pre-signed URLs
When using pre-signed URLs for blob uploads via authorization_endpoint_url), the plugin didn't extract or use the host from the pre-signed URL. It treated the URL as a URI path, so requests went to the wrong host or failed.
This patch added s3_parse_presigned_url() to parse pre-signed URLs and extract host, URI, and port Updated put_blob_object(), complete_multipart_upload(), and abort_multipart_upload() to:
- Extract the host from the pre-signed URL
- Temporarily set ctx->s3_client->host to the extracted host
- Validate the port matches the configuration
- Restore the original host after the request
Now blob uploads using pre-signed URLs now correctly use the host specified in the URL instead of the default S3 client host.
Example test:
service:
flush: 1
log_level: debug
pipeline:
inputs:
- name: blob
path: '/tmp/the_project_gutenberg_ebook.txt'
database_file: blob.db
upload_success_suffix: emit_log
outputs:
- name: s3
match: '*'
bucket: fluent-bit-s3-integration-blob
region: us-east-1
blob_database_file: s3-blob.db
# File buffering before upload
total_file_size: 1M
upload_timeout: 5s
# Blob upload settings
upload_parts_timeout: 10s
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
- Bug Fixes
- Improved presigned URL handling with enhanced parsing and validation of host, URI, and port information.
- Strengthened error recovery and resource cleanup for presigned URL operations.
- Added port validation to ensure consistency when using presigned URLs with S3.
✏️ Tip: You can customize this high-level summary in your review settings.
Walkthrough
A new presigned URL parsing helper is introduced to the S3 output plugin, enabling dynamic endpoint resolution. The function is integrated into blob upload and multipart operations, replacing the original host with the presigned endpoint during transfers and restoring it afterward. Error handling is centralized via cleanup labels to ensure consistent resource deallocation.
Changes
| Cohort / File(s) | Summary |
|---|---|
Presigned URL parsing infrastructure plugins/out_s3/s3.h, plugins/out_s3/s3.c |
Added new s3_parse_presigned_url() helper function that parses presigned URLs to extract host, URI, and port. Function validates inputs, applies scheme-based default ports, and returns allocated strings. |
Put blob object integration plugins/out_s3/s3.c |
Modified put_blob_object() to use presigned URL parsing, temporarily swapping the S3 client host with the presigned host, validating port consistency, and restoring original state. Converted early returns to centralized cleanup paths. |
Multipart operations integration plugins/out_s3/s3_multipart.c |
Extended presigned URL handling to multipart upload operations with host swapping and validation. Refactored error paths to use single cleanup flow, added result variable for standardized return handling, and ensured resource cleanup across all error conditions. |
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~45 minutes
- Resource cleanup consistency: Verify that all error paths properly restore original host, deallocate presigned host, URI, and other resources without leaks.
- Port validation logic: Ensure port consistency checks between presigned and client configuration are correct and handle edge cases (mismatches, default ports, scheme-based defaults).
- State management: Review host-swapping mechanism in both blob and multipart operations; ensure race conditions are not introduced and original state is restored even on partial failures.
- Error propagation: Confirm that
goto cleanuppaths are correctly placed and all control flow branches (success/failure) reach appropriate cleanup points.
Suggested labels
docs-required, backport to v4.1.x
Poem
🐰 A presigned hop, a URL parse,
Endpoints swapped with graceful care,
Host and port in balance dance,
S3 buckets everywhere!
Cleanup labels guard the way,
Resources freed, no leaks to fray.
Pre-merge checks and finishing touches
❌ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | ⚠️ Warning | Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. | You can run @coderabbitai generate docstrings to improve docstring coverage. |
✅ Passed checks (2 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Title check | ✅ Passed | The title 'out_s3: blob: fix handling of pre-signed URLs' accurately and clearly summarizes the main change: fixing pre-signed URL handling in the S3 output plugin for blob operations. |
✨ Finishing touches
- [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
- [ ] Commit unit tests in branch
s3-blob-fix
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
Ping @singholt @sparrc (pls if you can take a look at these changes)
When using pre-signed URLs for blob uploads via authorization_endpoint_url), the plugin didn't extract or use the host from the pre-signed URL. It treated the URL as a URI path, so requests went to the wrong host or failed.
Is the test config that you have provided missing the authorization_endpoint_url parameter?