httpd icon indicating copy to clipboard operation
httpd copied to clipboard

mod_log_config: Add log_formatter support and a JSON log_formatter

Open thomasmey opened this issue 2 years ago • 23 comments

Based on https://github.com/apache/httpd/pull/353 but based on trunk

thomasmey avatar Jul 15 '23 15:07 thomasmey

@ylavic May you have a look at this?

thomasmey avatar Aug 01 '23 06:08 thomasmey

One observation - there is a json encoder and decoder in apr-util that is Apache licensed.

https://github.com/apache/apr-util/blob/1.7.x/json/apr_json_encode.c

It is not released yet, and it's reasonable not to have to wait around until it is, but if you're looking for code to bring in that's already Apache, this should do.

The encoder code is designed to meet the strictness requirements of JSON parsing in JWT.

minfrin avatar Sep 10 '23 09:09 minfrin

One observation - there is a json encoder and decoder in apr-util that is Apache licensed.

https://github.com/apache/apr-util/blob/1.7.x/json/apr_json_encode.c

It is not released yet, and it's reasonable not to have to wait around until it is, but if you're looking for code to bring in that's already Apache, this should do.

The encoder code is designed to meet the strictness requirements of JSON parsing in JWT.

so, i'm not sure what to do now, the code for utf-8 parsing was initially propsed by @ylavic

should I rewrite to use above apr_json_encode.c which is available only in apr-util 1.7.x? how to even back port this to 2.4.x which is my original goal to provide this functionality on 2.4.x?

thomasmey avatar Oct 20 '23 21:10 thomasmey

@minfrin see original discussion here: https://github.com/apache/httpd/pull/353

thomasmey avatar Oct 20 '23 21:10 thomasmey

so, i'm not sure what to do now, the code for utf-8 parsing was initially propsed by @ylavic

should I rewrite to use above apr_json_encode.c which is available only in apr-util 1.7.x? how to even back port this to 2.4.x which is my original goal to provide this functionality on 2.4.x?

In the short term to get it into 2.4.x, depend on the external library, but in the longer term it would be better if the external dependency could be avoided. Ideally if you can herd the json code together so that it can be #ifdef'ed easily either to support apr_json now or perhaps in future if you don't have the resources to do something like this right now.

minfrin avatar Oct 24 '23 20:10 minfrin

Hi @minfrin I added a ifdef, one implementation using apr json and the other one the custom utf8 escaping. for apr json i added some TODO comments, not sure how to best cope with max numer of iovec elements, any adivse welcome,

what do you think?

thomasmey avatar Dec 10 '23 21:12 thomasmey

can someone please approve the github workflow?

thomasmey avatar Dec 10 '23 21:12 thomasmey

would @ylavic also please review again?

thomasmey avatar Dec 10 '23 21:12 thomasmey

In the test, the following got logged, which is a win:

{"remoteAddr":"127.0.0.1","size":"236","logicalUserName":"","request":"GET / HTTP/1.1","statusCode":"404","time":"[17/Dec/2023:19:52:58 +0000]","user":""}

In the test there was no "logicalUserName" or "user", but these were returned as an empty string. I suspect we need to alter this to not add the parameter if the parameter doesn't exist. Something like this:

    /* build json object */
    lfdj->total_len += add_str(strs, strl, "{");
    for (i = 0; i < lfd->nelts; ++i) {
        if (items[i].tag == NULL) {
            continue;
        }

        attribute_value = ap_escape_logjson(r->pool, lfd->portions[i], NULL, 0);

        if (!attribute_value) {
            continue;
        }

        if (formatter_options->use_short_attribute_names) {
            attribute_name = NULL;
        }
        else {
            attribute_name = apr_hash_get(json_hash, items[i].tag, APR_HASH_KEY_STRING);
        }

        if (!attribute_name) {
            attribute_name = ap_escape_logjson(r->pool, items[i].tag, NULL, 0); /* use tag as attribute name as fallback */
        }

        lfdj->total_len += add_str(strs, strl, "\"");
        lfdj->total_len += add_str(strs, strl, attribute_name);

        /* process any arguments as attribute name extension */
        if(items[i].arg != NULL && strlen(items[i].arg) > 0) {
            attribute_name = ap_escape_logjson(r->pool, items[i].arg, NULL, 0);
            lfdj->total_len += add_str(strs, strl, " ");
            lfdj->total_len += add_str(strs, strl, attribute_name);
        }
        lfdj->total_len += add_str(strs, strl, "\":\"");

        lfdj->total_len += add_str(strs, strl, attribute_value);
        lfdj->total_len += add_str(strs, strl, "\",");
    }

minfrin avatar Dec 17 '23 20:12 minfrin

In the test, the following got logged, which is a win:

{"remoteAddr":"127.0.0.1","size":"236","logicalUserName":"","request":"GET / HTTP/1.1","statusCode":"404","time":"[17/Dec/2023:19:52:58 +0000]","user":""}

In the test there was no "logicalUserName" or "user", but these were returned as an empty string. I suspect we need to alter this to not add the parameter if the parameter doesn't exist. Something like this:

oops, sorry I somehow managed to add in an old version of the non-apr-json code, let me fix.

But the question remains: should we add NULL or empty string values to the resulting JSON?

thomasmey avatar Dec 17 '23 21:12 thomasmey

But the question remains: should we add NULL or empty string values to the resulting JSON?

I would leave out the whole kv pair completely, no need to take up space logging something that represents it wasn't there.

An empty string would be confusing, as you wouldn't be able to distinguish between the case where a user was missing or a user was the empty string. This was be nasty to troubleshoot.

minfrin avatar Dec 17 '23 21:12 minfrin

In the test, the following got logged, which is a win:

{"remoteAddr":"127.0.0.1","size":"236","logicalUserName":"","request":"GET / HTTP/1.1","statusCode":"404","time":"[17/Dec/2023:19:52:58 +0000]","user":""}

In the test there was no "logicalUserName" or "user", but these were returned as an empty string. I suspect we need to alter this to not add the parameter if the parameter doesn't exist. Something like this:

oops, sorry I somehow managed to add in an old version of the non-apr-json code, let me fix.

But the question remains: should we add NULL or empty string values to the resulting JSON?

I think the empty values should be "" and the missing ones null, when configured a missing value should not be omitted.

ylavic avatar Dec 19 '23 10:12 ylavic

But the question remains: should we add NULL or empty string values to the resulting JSON?

I think the empty values should be "" and the missing ones null, when configured a missing value should not be omitted.

Is this not going to waste a lot of disk space and cycles?

Think of a time when we might be logging say 20 different attributes, and 10 of them are null. Seems to be wasteful.

minfrin avatar Dec 19 '23 12:12 minfrin

Is this not going to waste a lot of disk space and cycles?

Think of a time when we might be logging say 20 different attributes, and 10 of them are null. Seems to be wasteful.

Yeah, but then the LogFormat is explicit in the log, no need to know it for figuring out what's missing/expected. Not a strong opinion though, maybe space is more of a concern with json, there is no parsing/column issue like with plain lines (and the need for -). Let's see what others think..

ylavic avatar Dec 19 '23 13:12 ylavic

there is no parsing/column issue like with plain lines (and the need for -).

Seems like a good argument for omitting.

covener avatar Dec 19 '23 13:12 covener

At least for json null values we could add a config option like "omitnulls", to not serialize those attributes. not sure what default for this options should be.

thomasmey avatar Dec 19 '23 21:12 thomasmey

As an aside, I added support for JSON escaping to APR, which I plan to backport to v1.x: https://github.com/apache/apr/commit/3646493027dbf0dcc1523d1ebf182a5792aa70a3

minfrin avatar Dec 20 '23 22:12 minfrin

As an aside, I added support for JSON escaping to APR, which I plan to backport to v1.x: apache/apr@3646493

one obersavation: The esapce functions do return "NULL" if the input is NULL, but JSON null is lower case, or doesn't it matter?

thomasmey avatar Dec 21 '23 10:12 thomasmey

As an aside, I added support for JSON escaping to APR, which I plan to backport to v1.x: apache/apr@3646493

I did port your code into httpd, one questions regarding handling of those new functions: The apr-trunk builds do fail now, probably because of duplicate function definition, see: https://github.com/thomasmey/httpd/actions/runs/7287742760/job/19858988307

how to cope with that in general? is there a way to make those function definitons conditional on apr version or if linked against an apr version that already has those methods? how would such thing be done in C/autogen/configure/#ifdef?

thomasmey avatar Dec 21 '23 16:12 thomasmey

one obersavation: The esapce functions do return "NULL" if the input is NULL, but JSON null is lower case, or doesn't it matter?

It does matter - thank you for finding that. Fixed.

minfrin avatar Dec 22 '23 13:12 minfrin

As an aside, I added support for JSON escaping to APR, which I plan to backport to v1.x: apache/apr@3646493

I did port your code into httpd, one questions regarding handling of those new functions: The apr-trunk builds do fail now, probably because of duplicate function definition, see: https://github.com/thomasmey/httpd/actions/runs/7287742760/job/19858988307

how to cope with that in general? is there a way to make those function definitons conditional on apr version or if linked against an apr version that already has those methods? how would such thing be done in C/autogen/configure/#ifdef?

Ah, to bring anything in as a public function to httpd, give it the prefix ap_something(). The apr_something prefix is for APR stuff only.

In the long term, all of the encoding/decoding code in httpd is being deprecated and will be switched over to use the APR functions exclusively, but because of backwards compatibility the old functions are likely to be around for a long time.

The new functions use the tiniest memory footprint possible, rather than assuming a worst case scenario large output string (example: 3*strlen(in) for urlencoding), and then perform conversions on data that doesn't need escaping (that's most data), we work out the actual length of the string and whether conversion is needed at all as a separate step. We can then allocate the precise amount of memory needed, or nothing if not needed.

minfrin avatar Dec 22 '23 14:12 minfrin

Happy new year! How to proceed with this pr? Should I delete the now obsolete MIT licence utf8 handling code and squash all commits in the PR into one, so the code is really gone from git?

thomasmey avatar Jan 26 '24 08:01 thomasmey

Hi, @ylavic @minfrin @covener

I did clean-up and rebased/squashed this code into a new branch, please have a look here: https://github.com/apache/httpd/compare/trunk...thomasmey:httpd:json-trunk-clean?expand=1

If you like I can force push to thomasmey:json-trunk or create a new PR.

thomasmey avatar Feb 19 '24 20:02 thomasmey