redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

Python tools for transforming segment files

Open jrkinley opened this issue 3 years ago • 3 comments

This PR improves upon the segment analyser tool by adding the ability to:

  • Process individual log files (not just the entire data directory)
  • Fully decode record batches and write them back out in another format
  • If an output path is provided then the log files are written out in JSON format by default, or a custom encoder can be given...
from segment import Segment

def my_encoder(record):
    # Encode the Record object as Avro, CSV, JSON, Proto, ...

s = Segment("/path/to/segment.log",
            dest="/path/to/output.ext"
            encoder=my_encoder)

jrkinley avatar Jan 24 '22 15:01 jrkinley

@ivotron and @jrkinley - we should push this to PIP - and write docs @coral-waters so the docs would say


# pip install redpanda_utils;

import redpanda_utils as rp

s = rp.Segment("path/to/file")

Great idea to make it a full package.

cc: @vsaraswat @dswang for visibility

emaxerrno avatar Jan 24 '22 17:01 emaxerrno

Also maybe it would be great to add separate commits for each feature that you mentioned in your PR heading, formatting can also be a part of the workflow, i.e. every change formatted correctly

graphcareful avatar Jan 24 '22 18:01 graphcareful

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 2 committers have signed the CLA.

:x: James Kinley
:x: jrkinley


James Kinley seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Feb 10 '22 11:02 CLAassistant