beats icon indicating copy to clipboard operation
beats copied to clipboard

Add experimental GZIP support to filestream

Open AndersonQ opened this issue 6 months ago • 1 comments

This feature introduces experimental support for reading GZIP-compressed files directly within the filestream input. Many logging systems and rotation strategies produce compressed log files (e.g., .log.gz). Currently, filebeat cannot process these files until they are decompressed manually.

By adding native GZIP support, filebeat will be able to transparently decompress and ingest these log files, simplifying log management pipelines and reducing manual intervention.

This initial implementation will be released as a tech preview feature, enabled by a new configuration flag.

Required tests: integration and unit tests to cover various scenarios, including file rotation, offset resume, and error handling for filebeat on a standard environment. Further integration tests with k8, complex scenarios including possibility of OOM kill are covered in further issues.

Author's checklist

  • [x] Add new configuration for enabling GZIP support.
  • [x] Implement GZIP file detection based on magic bytes.
  • [x] Integrate GZIP reader seeker.
  • [ ] Integrity verification (CRC32 & ISIZE) at the end of the file and log errors on mismatch. - most likely just needs to test.
  • [ ] Implement modification detection to abort ingestion if the file is appended to or truncated during processing.
  • [x] Adjust the copytruncate rotation mechanism to correctly handle GZIP files.
  • [ ] Add metrics to monitor GZIP processing (e.g., validation errors, bytes processed).
  • [ ] Develop integration tests to cover various scenarios, including file rotation, offset resume, and error handling.

AndersonQ avatar Jun 17 '25 13:06 AndersonQ

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

elasticmachine avatar Jun 17 '25 13:06 elasticmachine