ballerina-library
ballerina-library copied to clipboard
[Proposal] Add and make access logs attributes configurable
Summary
At the moment access logs only print a limited number of information. Moreover, there is no way to select the type of information that needs to be printed or the format of the log entry. This proposal is to address these limitations.
Goals
- Make access log information configurable
- Add new information to access log
- Introduce JSON format for access log
Non-Goals
- Make the order of access log information configurable
Motivation
It is common to have different information in the access log based on the organization. Therefore, current limitations in access log, hinders the usability of Ballerina.
Description
At the moment one can enable access log using the below configuration for a given HTTP service.
[ballerina.http.accessLogConfig]
# Enable printing access logs on the Console. The default value is `false`.
console = true
# Specify the file path to save the access logs. This is optional.
path = "testAccessLog.txt"
These configurations need to be in the config.toml
and the resulting output looks as follows.
127.0.0.1 - - [15/Dec/2022:11:39:42 +0530] "GET /albums HTTP/1.1" 200 95 "-" "curl/7.79.1"
The above output is fixed and the user has no control over it. First thing that is needed to make it configurable is to list down the type information. Following is the key and description of each information that can be printed with the new proposal.
Name | Description |
---|---|
remote_ip | IP of the remote address |
verb | HTTP verb of the request |
path | Path of the request |
schema_and_version | Schema of the request and HTTP version |
status_code | HTTP status code |
response_size | Size of the response body |
request_size | Size of the request body |
response_time | Time to took send the response |
agent | Name of the agent who sent the request |
upstream_ip | IP of the upstream server |
upstream_response_time | Upstream server response time |
Please note that time and the date of log is not configurable. Then to make this information configurable, a new key is introduced to the config.toml
configuration as attributes
.
[ballerina.http.accessLogConfig]
# The list of attributes to be printed
attributes = remote_ip, verb, path, schema_and_version, status_code, response_size, agent
One can configure the list of attributes as needed. The above is the default list. Finally, to change the format of access log entry another new configuration is introduced as format
. The two formats are flat
and json
.
[ballerina.http.accessLogConfig]
# The list of attributes to be printed
format = flat
The default value of the format key is flat, which is concise and has least impact on the performance. This can be changed by setting the value to json
, which results in the same information in json
format.
{
"remote_ip": "127.0.0.1",
"verb": "GET",
"path": "/albums",
"schema_and_version": "HTTP/1.1",
"status_code": 200,
"response_size": 95,
"agent": "curl/7.79.1",
"time_stamp": "[15/Dec/2022:11:39:42 +0530]"
}
Hi Shafreen,
Thank you for the proposal. Let me add a few questions that I need to clarify.
- Regarding
path Path of the request
I think this is the downstream request. This is important to us. And also, there are some customers who does some changes to the path before sending the upstream request. So its good if we can see the upstream request path too. -
status_code
- we need to confirm whether this status code originated from the upstream or due to some reason mediation service initiated this. Do we have a way for that? Envoy has an attribute called RESPONSE_CODE_DETAILS which solves a similar problem. -
response_time Time to took send the response
Is this the total response time on the listener side right? This is important to us. This means response_time - upstream_response_time gives the latency added by the ballerina mediation service, which is an important measure we want to identify. Hope my understanding is correct. -
agent
- this is the UserAgent header? - What is the difference between
remote_ip
andupstream_ip
? Is this client IP vs upstream IP? - The other parameters you mentioned (
verb
,schema_and_version
,response_size
,request_size
) are required too. +1 to include them.
Anther important facts we need to see is:
- Can we have another field called requestId which is logged by using an incoming header value. This is important for us to correlate requests throughout the path (nginx LB -> Gateway -> Mediation service) we use the same request id. By default we use the x-request-id header.
- What happens to the above log when there's a transport failure during the upstream call (connection refuse, connection timeout, read timeout) and can we identify that from the log?
The default value of the format key is flat, which is concise and has least impact on the performance. This can be changed by setting the value to json, which results in the same information in json format.
If we used json or flat, we will get all the details in a single log line right? Hope we'll not format the json with multiline (coz logs will be overlapped in high concurrency).
+1 for this =) we have the same requirement for BCentral.
[Proposal] Improve AccessLog in Ballerina Services
Ballerina's current HTTP access logging mechanism provides a static and limited view of interactions, lacking the flexibility required by diverse organizational needs. This proposal advocates for a more configurable logging system, introducing options for log content customization, format variation (including JSON), and adding new information types to access logs.
Objectives
- Configurable Log Information: Empower users to select and configure the log information relevant to their needs.
- Enhanced Log Content: Enrich the access logs by adding new types of information.
- Format Flexibility: Introduce JSON as an alternative log format, alongside the existing flat format.
Motivation
Different organizations have unique requirements for logging web access data, driven by varied use cases such as security analysis, performance monitoring, and compliance auditing. The inflexibility in Ballerina's access logging fails to meet these diverse needs, thus limiting its utility and adoption. By enhancing customization capabilities, Ballerina can better serve a wider array of enterprise and developer scenarios.
Description
Current Limitations
Presently, enabling HTTP access logs in Ballerina is constrained to minimal configuration options, as shown below:
[ballerina.http.accessLogConfig]
console = true # Enables logging to the console.
path = "testAccessLog.txt" # Optional: Specifies a file path for logging.
The resultant log entry is rigid:
127.0.0.1 - - [15/Dec/2022:11:39:42 +0530] "GET /albums HTTP/1.1" 200 95 "-" "curl/7.79.1"
AccessLog in other Web Servers
Nginx
Nginx offers a highly customizable logging mechanism. Users can define custom log formats by specifying the desired variables in the nginx.conf configuration file. Here’s an example of how to define a custom log format:
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
In addition to this, since Nginx supports a customizable logging mechanism, we can customize the logging format to be JSON as well as below:
log_format json_combined escape=json '{'
'"time_local":"$time_local",'
'"remote_addr":"$remote_addr",'
'"remote_user":"$remote_user",'
'"request":"$request",'
'"status": "$status",'
'"body_bytes_sent":"$body_bytes_sent",'
'"request_time":"$request_time",'
'"http_referrer":"$http_referer",'
'"http_user_agent":"$http_user_agent"'
'}';
In this configuration, escape=json
ensures that strings are properly escaped to conform to JSON standards.
Apache HTTP Server
Apache uses modules to extend its functionality, with mod_log_config
providing flexible logging capabilities. This module allows for a high degree of customization in how access logs are formatted and what information is logged.
Apache allows administrators to define custom log formats using the LogFormat
directive. This directive can include a wide variety of variables, enabling detailed control over what information is logged and its presentation:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" my_custom_format
While Apache does not natively support JSON logging out-of-the-box as simply as Nginx does with its escape=json
parameter, you can achieve JSON logging by creatively using its custom log format capabilities.
LogFormat "{ \"remote_ip\": \"%h\", \"time\": \"%t\", \"request\": \"%U%q\", \"status\": %>s, \"referrer\": \"%{Referer}i\", \"user_agent\": \"%{User-Agent}i\" }" json
Proposed Enhancements
Configurable Attributes
To introduce flexibility, we propose listing specific types of log information (e.g., ip, request_method, patrequest_uri, status, response_body_size) that users can selectively include in their logs. The introduction of an attributes
configuration in config.toml
will allow for such customization:
Name | Description |
---|---|
ip | The IP address of the client making the request |
request | The full HTTP request line, including the method, URI, and protocol version |
request_method | The HTTP method of the request |
request_uri | The URI of the request, including parameters |
scheme | Scheme of the request and HTTP version |
status | The HTTP status code sent to the client |
request_body_size | Size of the request body |
response_body_size | The size of the HTTP response body sent to the client, in bytes |
request_time | The total request processing time, from reading the client's request to sending the last byte of the response |
http_referrer | The Referer header field of the request, indicating the page from which the user navigated |
http_user_agent | The User-Agent header field, name of the agent who sent the request |
http_x_forwarded_for | The originating IP address of a client connecting through an HTTP proxy or a load balancer |
http_(X-Custom-Header) | Header fields. Referring to them with http followed by the header name (x-request-id -> http_x-request-id ) |
host | Host name from the request that the client sent |
Default fields: ip, request, status, response_body_size, http_referer, http_user_agent
Note: All the above fields are available for upstream as well.
Note: Most of these field names are common across all other services, we wanted to stick with the common terminology, in most cases we have followed nginx terminology.
[ballerina.http.accessLogConfig]
attributes = ["ip", "request", "status", "response_body_size", "http_user_agent"]
Log Formats
In addition to the flat format, we propose introducing a JSON format for access logs to provide structured and easily parseable log data. Users can select their preferred format via a new format configuration:
[ballerina.http.accessLogConfig]
format = "json" # Options: "flat", "json"
The FLAT format example:
- No upstream server:
127.0.0.1 - - [15/Dec/2022:11:39:42 +0530] "GET /albums HTTP/1.1" 200 95 "-" "curl/7.79.1"
- In the presence of upstream servers:
We use
"~"
to differentiate downstream and upstream. In the below case, there are two upstream servers, and the information on that is logged as per the defined attributes.
127.0.0.1 - - [15/Dec/2022:11:39:42 +0530] "GET /albums HTTP/1.1" 200 95 "-" "curl/7.79.1" "~" 192.168.3.7 - - [15/Dec/2022:11:39:48 +0530] "POST /myAlbums HTTP/1.1" 201 235 "-" "ballerina" 192.168.5.5 - - [15/Dec/2022:11:39:55 +0530] "GET /yourAlbums HTTP/1.1" 200 222 "-" "ballerina"
The JSON format example:
{
"ip": "127.0.0.1",
"verb": "GET",
"path": "/albums",
"schema": "HTTP/1.1",
"status": 200,
"response_size": 95,
"agent": "curl/7.79.1",
"time_stamp": "[15/Dec/2022:11:39:42 +0530]",
"upstream": [
{
"ip": "192.168.3.7",
"verb": "POST",
"path": "/myAlbums",
"schema": "HTTP/1.1",
"status": 201,
"response_size": 36,
"agent": "ballerina",
"time_stamp": "[15/Dec/2022:11:39:42 +0530]"
},
{
"ip": "192.168.5.5",
"verb": "GET",
"path": "/yourAlbums",
"schema": "HTTP/1.1",
"status": 200,
"response_size": 40,
"agent": "ballerina",
"time_stamp": "[15/Dec/2022:11:39:42 +0530]"
}
]
}
Conclusion
By enhancing the Ballerina HTTP access log's customization capabilities, we aim to address current limitations and better fulfill the diverse requirements of users and organizations.
@AzeemMuzammil Can we clearly differentiate between the response_time
and request_time
in the proposal?
Sorry, now only could look at the proposal.
-
As @TharmiganK also mentioned, "response_time Time took to send the response" is not clear to me too. Can we elaborate a bit more on this?
-
Is it possible to add upstream_hostname too? (Do not try to reverse dns look up and resolve using the IP. We can log this only if the upstream host is provided)
-
And also, with mediation services, we are using client->execute method and only once during an request. (So the upstream array will only have one json object). Hope this path doesn't cause any issues.
-
Are we using the existing ballerina access log config to enable this too? or do we need additional configs?
-
As I mentioned earlier too, hope JSON format is printed with a single line by default (I think you mentioned a formatted multiline json only for reading purposes).
-
Can you also tryout a few different scenarios where the backend doesn't response properly, eg:
- Connection failure (this can be a connection refuse, failed TLS handshake etc)
- Connection timeout (ballerina client tries to initiate a connection with the backend but the server doesn't respond to the connection request)
- Response timeout (initial TCP connection with the backend is successful, but server doesn't respond on time with the HTTP response)
- Client closes the connection before backend responds (the client who calls the ballerina service closes the connection before backend responds)
These are commonly occurring failures when connecting to backends. Its important to reflect these from access logs. These cases I mentioned are mainly for HTTP/1.1. I think @TharmiganK might know more scenarios which are related to HTTP/2.0.
Other than that, other available fields looks good to me.
@VirajSalaka @Thushani-Jayasekera @renuka-fernando please have a look at the proposal and see if there are anything important missing.
The following are some other suggestions.
- In the access log config
response_body_size
refers toThe size of the HTTP response body sent to the client, in bytes
. Is it possible to addupstream_response_body_size
as well for each upstream? -
request_body_size
refers to theSize of the request body
(from downstream). Is it possible to addupstream_request_body_size
where the payload size of the request is sent to the upstream?
@AzeemMuzammil Can we clearly differentiate between the
response_time
andrequest_time
in the proposal?
As we discussed, we will be removing request_time
field as there is no consistent interpretation for this, and response_time
captures what are we interested in. (Updated the proposal as per this)
* Is it possible to add upstream_hostname too? (Do not try to reverse dns look up and resolve using the IP. We can log this only if the upstream host is provided)
Yes, we will and we can add the host name as well under the filed name host
if host is provided.
* And also, with mediation services, we are using client->execute method and only once during an request. (So the upstream array will only have one json object). Hope this path doesn't cause any issues.
No, this would not cause any issues and should not,
* Are we using the existing ballerina access log config to enable this too? or do we need additional configs?
The existing log config should work fine without any issues, but to enable any additional properties to lot, we have to enable them under the new property attributes
introduced in Config.toml
, in addition to this I have updated the proposal with default fields and all existing fields.
* As I mentioned earlier too, hope JSON format is printed with a single line by default (I think you mentioned a formatted multiline json only for reading purposes).
Yes, correct
* Can you also tryout a few different scenarios where the backend doesn't response properly, eg: * Connection failure (this can be a connection refuse, failed TLS handshake etc) * Connection timeout (ballerina client tries to initiate a connection with the backend but the server doesn't respond to the connection request) * Response timeout (initial TCP connection with the backend is successful, but server doesn't respond on time with the HTTP response) * Client closes the connection before backend responds (the client who calls the ballerina service closes the connection before backend responds) These are commonly occurring failures when connecting to backends. Its important to reflect these from access logs. These cases I mentioned are mainly for HTTP/1.1. I think @TharmiganK might know more scenarios which are related to HTTP/2.0.
Sure will check this as well,
* In the access log config `response_body_size` refers to `The size of the HTTP response body sent to the client, in bytes`. Is it possible to add `upstream_response_body_size` as well for each upstream?
Yes, every fields which are available under downstream path will be available for upstream path as well, I have updated this in the proposal to reflect this.
* `request_body_size` refers to the `Size of the request body` (from downstream). Is it possible to add `upstream_request_body_size` where the payload size of the request is sent to the upstream?
Yes, we will have that filed as well.
Please check the responses @TharmiganK @malinthaprasan @renuka-fernando, feel free to ask any questions if available.
Hi @AzeemMuzammil,
Could you please provide an update on the current status?
I am sorry, this request might be coming in a bit late. In recent customer issues, when we’ve noticed delays in their API responses we have to analyse the network traces etc to get further details. After analyzing the situation using TCP dumps and other network tools, we usually find that these delays are often occurring at the connection establishment level. Is it possible for you to log the time taken for connection establishment in the access log itself? It will be a great time saver for us. For connection reuse, the initial request should log a positive value, while subsequent requests using the same connection should log a time of 0, as there's no need for re-establishment.
Also reminder about below as they are very important.
* Can you also tryout a few different scenarios where the backend doesn't response properly, eg:
* Connection failure (this can be a connection refuse, failed TLS handshake etc)
* Connection timeout (ballerina client tries to initiate a connection with the backend but the server doesn't respond to the connection request)
* Response timeout (initial TCP connection with the backend is successful, but server doesn't respond on time with the HTTP response)
* Client closes the connection before backend responds (the client who calls the ballerina service closes the connection before backend responds)
These are commonly occurring failures when connecting to backends. Its important to reflect these from access logs. These cases I mentioned are mainly for HTTP/1.1. I think @TharmiganK might know more scenarios which are related to HTTP/2.0.
Hi @AzeemMuzammil,
Could you please provide an update on the current status?
I am sorry, this request might be coming in a bit late. In recent customer issues, when we’ve noticed delays in their API responses we have to analyse the network traces etc to get further details. After analyzing the situation using TCP dumps and other network tools, we usually find that these delays are often occurring at the connection establishment level. Is it possible for you to log the time taken for connection establishment in the access log itself? It will be a great time saver for us. For connection reuse, the initial request should log a positive value, while subsequent requests using the same connection should log a time of 0, as there's no need for re-establishment.
Hi @malinthaprasan as we discussed offline, we will look at this as a separate issue as we are already in line to merge the current changes. To track that I will create an issue.
Also reminder about below as they are very important.
* Can you also tryout a few different scenarios where the backend doesn't response properly, eg: * Connection failure (this can be a connection refuse, failed TLS handshake etc) * Connection timeout (ballerina client tries to initiate a connection with the backend but the server doesn't respond to the connection request) * Response timeout (initial TCP connection with the backend is successful, but server doesn't respond on time with the HTTP response) * Client closes the connection before backend responds (the client who calls the ballerina service closes the connection before backend responds) These are commonly occurring failures when connecting to backends. Its important to reflect these from access logs. These cases I mentioned are mainly for HTTP/1.1. I think @TharmiganK might know more scenarios which are related to HTTP/2.0.
I have tested out the scenarios mentioned above and we do still log these but not part of access logs, incorporating these into access logs requires a design reconsideration, we will look into this if possible,
I have tested out the scenarios mentioned above and we do still log these but not part of access logs, incorporating these into access logs requires a design reconsideration, we will look into this if possible
In what type of log do we have this information? And how do we correlate them? Bdw, we need to consider this as an important aspect of access logs. One of the main values of these logs is for troubleshooting. If we don't have this data in the access logs, we'll be missing out on important details..
@malinthaprasan as I mentioned these are not logged as part of access logs, and we have to implement these as part of access logs if required, can you create a new issue for any other requirements that are needed to be part of the access log other than what is mentioned in the proposal. whatever is mentioned in the proposal will be released as part of update 10. and anything in addition we can incorporate in upcoming updates. Thanks,
For the moment, since the feature is implemented and being shipped with update 10, I'll close the issue, if required feel free to open it.
Ack, that is fine.
Is the feature released in an 2201.5.x version now? (Asking because the issue was closed)
No, the feature will be released with 2201.10.0, the reason to close the issue is, that the relevant change is already merged and in the pipeline to release with 2201.10.0 as we planned.