mod_log_config: Add log_formatter support and a JSON log_formatter
Based on https://github.com/apache/httpd/pull/353 but based on trunk
@ylavic May you have a look at this?
One observation - there is a json encoder and decoder in apr-util that is Apache licensed.
https://github.com/apache/apr-util/blob/1.7.x/json/apr_json_encode.c
It is not released yet, and it's reasonable not to have to wait around until it is, but if you're looking for code to bring in that's already Apache, this should do.
The encoder code is designed to meet the strictness requirements of JSON parsing in JWT.
One observation - there is a json encoder and decoder in apr-util that is Apache licensed.
https://github.com/apache/apr-util/blob/1.7.x/json/apr_json_encode.c
It is not released yet, and it's reasonable not to have to wait around until it is, but if you're looking for code to bring in that's already Apache, this should do.
The encoder code is designed to meet the strictness requirements of JSON parsing in JWT.
so, i'm not sure what to do now, the code for utf-8 parsing was initially propsed by @ylavic
should I rewrite to use above apr_json_encode.c which is available only in apr-util 1.7.x? how to even back port this to 2.4.x which is my original goal to provide this functionality on 2.4.x?
@minfrin see original discussion here: https://github.com/apache/httpd/pull/353
so, i'm not sure what to do now, the code for utf-8 parsing was initially propsed by @ylavic
should I rewrite to use above apr_json_encode.c which is available only in apr-util 1.7.x? how to even back port this to 2.4.x which is my original goal to provide this functionality on 2.4.x?
In the short term to get it into 2.4.x, depend on the external library, but in the longer term it would be better if the external dependency could be avoided. Ideally if you can herd the json code together so that it can be #ifdef'ed easily either to support apr_json now or perhaps in future if you don't have the resources to do something like this right now.
Hi @minfrin I added a ifdef, one implementation using apr json and the other one the custom utf8 escaping. for apr json i added some TODO comments, not sure how to best cope with max numer of iovec elements, any adivse welcome,
what do you think?
can someone please approve the github workflow?
would @ylavic also please review again?
In the test, the following got logged, which is a win:
{"remoteAddr":"127.0.0.1","size":"236","logicalUserName":"","request":"GET / HTTP/1.1","statusCode":"404","time":"[17/Dec/2023:19:52:58 +0000]","user":""}
In the test there was no "logicalUserName" or "user", but these were returned as an empty string. I suspect we need to alter this to not add the parameter if the parameter doesn't exist. Something like this:
/* build json object */
lfdj->total_len += add_str(strs, strl, "{");
for (i = 0; i < lfd->nelts; ++i) {
if (items[i].tag == NULL) {
continue;
}
attribute_value = ap_escape_logjson(r->pool, lfd->portions[i], NULL, 0);
if (!attribute_value) {
continue;
}
if (formatter_options->use_short_attribute_names) {
attribute_name = NULL;
}
else {
attribute_name = apr_hash_get(json_hash, items[i].tag, APR_HASH_KEY_STRING);
}
if (!attribute_name) {
attribute_name = ap_escape_logjson(r->pool, items[i].tag, NULL, 0); /* use tag as attribute name as fallback */
}
lfdj->total_len += add_str(strs, strl, "\"");
lfdj->total_len += add_str(strs, strl, attribute_name);
/* process any arguments as attribute name extension */
if(items[i].arg != NULL && strlen(items[i].arg) > 0) {
attribute_name = ap_escape_logjson(r->pool, items[i].arg, NULL, 0);
lfdj->total_len += add_str(strs, strl, " ");
lfdj->total_len += add_str(strs, strl, attribute_name);
}
lfdj->total_len += add_str(strs, strl, "\":\"");
lfdj->total_len += add_str(strs, strl, attribute_value);
lfdj->total_len += add_str(strs, strl, "\",");
}
In the test, the following got logged, which is a win:
{"remoteAddr":"127.0.0.1","size":"236","logicalUserName":"","request":"GET / HTTP/1.1","statusCode":"404","time":"[17/Dec/2023:19:52:58 +0000]","user":""}In the test there was no "logicalUserName" or "user", but these were returned as an empty string. I suspect we need to alter this to not add the parameter if the parameter doesn't exist. Something like this:
oops, sorry I somehow managed to add in an old version of the non-apr-json code, let me fix.
But the question remains: should we add NULL or empty string values to the resulting JSON?
But the question remains: should we add NULL or empty string values to the resulting JSON?
I would leave out the whole kv pair completely, no need to take up space logging something that represents it wasn't there.
An empty string would be confusing, as you wouldn't be able to distinguish between the case where a user was missing or a user was the empty string. This was be nasty to troubleshoot.
In the test, the following got logged, which is a win:
{"remoteAddr":"127.0.0.1","size":"236","logicalUserName":"","request":"GET / HTTP/1.1","statusCode":"404","time":"[17/Dec/2023:19:52:58 +0000]","user":""}In the test there was no "logicalUserName" or "user", but these were returned as an empty string. I suspect we need to alter this to not add the parameter if the parameter doesn't exist. Something like this:
oops, sorry I somehow managed to add in an old version of the non-apr-json code, let me fix.
But the question remains: should we add NULL or empty string values to the resulting JSON?
I think the empty values should be "" and the missing ones null, when configured a missing value should not be omitted.
But the question remains: should we add NULL or empty string values to the resulting JSON?
I think the empty values should be
""and the missing onesnull, when configured a missing value should not be omitted.
Is this not going to waste a lot of disk space and cycles?
Think of a time when we might be logging say 20 different attributes, and 10 of them are null. Seems to be wasteful.
Is this not going to waste a lot of disk space and cycles?
Think of a time when we might be logging say 20 different attributes, and 10 of them are null. Seems to be wasteful.
Yeah, but then the LogFormat is explicit in the log, no need to know it for figuring out what's missing/expected. Not a strong opinion though, maybe space is more of a concern with json, there is no parsing/column issue like with plain lines (and the need for -).
Let's see what others think..
there is no parsing/column issue like with plain lines (and the need for -).
Seems like a good argument for omitting.
At least for json null values we could add a config option like "omitnulls", to not serialize those attributes. not sure what default for this options should be.
As an aside, I added support for JSON escaping to APR, which I plan to backport to v1.x: https://github.com/apache/apr/commit/3646493027dbf0dcc1523d1ebf182a5792aa70a3
As an aside, I added support for JSON escaping to APR, which I plan to backport to v1.x: apache/apr@3646493
one obersavation: The esapce functions do return "NULL" if the input is NULL, but JSON null is lower case, or doesn't it matter?
As an aside, I added support for JSON escaping to APR, which I plan to backport to v1.x: apache/apr@3646493
I did port your code into httpd, one questions regarding handling of those new functions: The apr-trunk builds do fail now, probably because of duplicate function definition, see: https://github.com/thomasmey/httpd/actions/runs/7287742760/job/19858988307
how to cope with that in general? is there a way to make those function definitons conditional on apr version or if linked against an apr version that already has those methods? how would such thing be done in C/autogen/configure/#ifdef?
one obersavation: The esapce functions do return "NULL" if the input is NULL, but JSON null is lower case, or doesn't it matter?
It does matter - thank you for finding that. Fixed.
As an aside, I added support for JSON escaping to APR, which I plan to backport to v1.x: apache/apr@3646493
I did port your code into httpd, one questions regarding handling of those new functions: The apr-trunk builds do fail now, probably because of duplicate function definition, see: https://github.com/thomasmey/httpd/actions/runs/7287742760/job/19858988307
how to cope with that in general? is there a way to make those function definitons conditional on apr version or if linked against an apr version that already has those methods? how would such thing be done in C/autogen/configure/#ifdef?
Ah, to bring anything in as a public function to httpd, give it the prefix ap_something(). The apr_something prefix is for APR stuff only.
In the long term, all of the encoding/decoding code in httpd is being deprecated and will be switched over to use the APR functions exclusively, but because of backwards compatibility the old functions are likely to be around for a long time.
The new functions use the tiniest memory footprint possible, rather than assuming a worst case scenario large output string (example: 3*strlen(in) for urlencoding), and then perform conversions on data that doesn't need escaping (that's most data), we work out the actual length of the string and whether conversion is needed at all as a separate step. We can then allocate the precise amount of memory needed, or nothing if not needed.
Happy new year! How to proceed with this pr? Should I delete the now obsolete MIT licence utf8 handling code and squash all commits in the PR into one, so the code is really gone from git?
Hi, @ylavic @minfrin @covener
I did clean-up and rebased/squashed this code into a new branch, please have a look here: https://github.com/apache/httpd/compare/trunk...thomasmey:httpd:json-trunk-clean?expand=1
If you like I can force push to thomasmey:json-trunk or create a new PR.