redpanda
redpanda copied to clipboard
Python tools for transforming segment files
This PR improves upon the segment analyser tool by adding the ability to:
- Process individual log files (not just the entire data directory)
- Fully decode record batches and write them back out in another format
- If an output path is provided then the log files are written out in JSON format by default, or a custom encoder can be given...
from segment import Segment
def my_encoder(record):
# Encode the Record object as Avro, CSV, JSON, Proto, ...
s = Segment("/path/to/segment.log",
dest="/path/to/output.ext"
encoder=my_encoder)
@ivotron and @jrkinley - we should push this to PIP - and write docs @coral-waters so the docs would say
# pip install redpanda_utils;
import redpanda_utils as rp
s = rp.Segment("path/to/file")
Great idea to make it a full package.
cc: @vsaraswat @dswang for visibility
Also maybe it would be great to add separate commits for each feature that you mentioned in your PR heading, formatting can also be a part of the workflow, i.e. every change formatted correctly
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 2 committers have signed the CLA.
:x: James Kinley
:x: jrkinley
James Kinley seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.