hl icon indicating copy to clipboard operation
hl copied to clipboard

Customizable Output Formatting and Custom Fields

Open theShmoo opened this issue 1 year ago • 8 comments

Allow users to customize the output format and add custom fields for enhanced log readability.

Describe the problem this feature request solves

Currently, pamburus/hl provides a good way to read JSON log lines. However, it lacks the ability to customize the output format. This limits the user's ability to tailor the output to their specific needs and makes it difficult to add custom fields for more context or analysis.

Describe the solution you'd like

I propose adding the following features:

  • Customizable Output Format: Allow users to define their own output format using a template or configuration file. This would enable users to control:
    • The order of fields
    • The display format of each field (e.g., date/time formatting, number formatting)
    • The inclusion/exclusion of specific fields
    • Adding separators or other visual elements
  • Custom Fields: Enable users to add custom fields to the output. These fields could be derived from existing fields using expressions or functions. For example:
    • Calculate the duration between two timestamps
    • Extract parts of a string using regular expressions
    • Perform simple calculations on numeric fields

Describe alternatives you've considered

I have considered using other tools for reading JSON logs, but pamburus/hl is currently the best in terms of its core functionality. However, the lack of output customization hinders its usefulness for more advanced use cases.

Additional context

This feature would greatly enhance the flexibility and usability of pamburus/hl, making it a more powerful tool for analyzing JSON log data. It would allow users to create highly customized views of their logs, tailored to their specific needs.

Example use case:

Imagine a log line like this:

{"timestamp": "2024-10-27T10:00:00Z", "level": "INFO", "message": "Request received", "request_id": "12345", "duration_ms": 150}

With customizable output, I could define a format like:

[{{timestamp | date:"%Y-%m-%d %H:%M:%S"}}] [{{level}}] (Request ID: {{request_id}}) - {{message}} (Duration: {{duration_ms}}ms)

Resulting in an output like:

[2024-10-27 10:00:00] [INFO] (Request ID: 12345) - Request received (Duration: 150ms)

This provides a much clearer and more informative view of the log data.

theShmoo avatar Dec 19 '24 07:12 theShmoo

Thank you for your interest, detailed review and suggestions.

Yes, it is a good idea to make the output layout configuration more flexible. I've been thinking about it already, maybe it's time to implement it.

pamburus avatar Dec 21 '24 18:12 pamburus

Thank you for the fast answer!

I would be happy to help you by providing more use cases and priorities of the proposed features if you need them.

theShmoo avatar Dec 23 '24 12:12 theShmoo

You are always welcome to share your use cases or scenarios. It would be helpful.

But there are 2 major obstacles to overcome.

  1. The internal data model is initially highly optimized for performance only for the currently supported cases and lacks flexibility. It does not contain hash tables for fields - all fields are processed sequentially. It also performs lazy parsing for fields only when necessary, i.e. the log record has not been filtered out, the field is not hidden, or its value is needed to apply filtering by field values. However, this lazy parsing is not properly cached, and multiple accesses to the same field multiple times may result in multiple parsings. This is not a big deal now, but to allow templated formatting, this should be improved.

  2. There is a work in progress that aims to significantly improve log readability when there are a lot of fields or some fields have multi-line values. It is called expansion. It is intended to automatically pretty print long log records with one field per line and to use better multi-line formatting for multi-line fields. It will also be possible to manually control the thresholds for automatic expansion or force it on/off. It is almost done, but it has a lot of heuristics and dark magic in it, and I am not sure yet if I am ready to merge it into the master branch. The problem with it is that it contains significant refactorings that may interfere with the refactorings needed for this feature.

I'm experimenting with improvements to the internal data models, we'll see what can be done here.

pamburus avatar Dec 24 '24 19:12 pamburus

Maybe https://github.com/ISibboI/evalexpr is an option instead of building a custom formatting parser, especially if it gets more complicated. This could then also be used for more complex filtering scenarios. For example, I might want to filter logs by a span name, which is part of an array.

adiba avatar Jun 25 '25 01:06 adiba

@adiba As for custom formatting, I still need to finish the ongoing refactoring. Once that's done, it will be much easier to implement custom formatters. I just don't have enough time for that right now.

For complex filtering scenarios, there is a --query option. It can already easily be extended with additional features. If it does not cover all your scenarios, please start a discussion on the Discussions page, I will then transform it into an issue.

In the meantime, I will check if filtering by an item in the array is possible with the current implementation.

pamburus avatar Jun 25 '25 16:06 pamburus

@adiba I checked, and filtering by an item in an array already works. Here is an example source file:

{"msg":"price obtained", "price":3, "span": ["a"]}
{"msg":"price obtained", "price":5, "span": ["a", "b"]}

Here is an example command:

hl -P --query 'span contain "b"' example.log

It outputs

price obtained price=5 span=[a b]

pamburus avatar Jun 25 '25 16:06 pamburus

@pamburus Yes, thank you, I already saw the contain operator. My format is actually something like "spans": [{"name": "span_name", "other_field": ...}, ...]

Maybe one could extend the queries to something like spans[any].name = "span_name"?

My other thought regarding custom formatting was about filtering and formatting that array of spans.

Thank you for all the work you are doing! It's an amazing project.

adiba avatar Jun 26 '25 15:06 adiba

Hi, everyone. I'm envisioning another feature, not sorting all fields, but prioritizing the display of certain fields.

For example, if we want to prioritize the display of the event field, we can input hl --first event. Then, for each line of the output, if the line contains the event field, this field will be displayed first, and the remaining fields will be shown in their original order afterward.

How about this? Do you like it?

HairlessVillager avatar Sep 28 '25 08:09 HairlessVillager