tfw_logger: missing logs during the first second of application startup
Describe the issue No logs are present in the ClickHouse database during the first second of application startup, although logs do appear in dmesg.
Expected Behavior Logs are present in both dmesg and the ClickHouse access_log table.
To Reproduce Start Tempesta FW, wait for it to begin responding, and then send a request as soon as possible. I reproduced this with a simple test.
import time
from datetime import datetime
from test_suite import tester
from helpers.error import ProcessBadExitStatusException
import clickhouse_connect
__author__ = "Tempesta Technologies, Inc."
__copyright__ = "Copyright (C) 2018-2025 Tempesta Technologies, Inc."
__license__ = "GPL2"
class TestClickhouseLogsBaseTest(tester.TempestaTest):
backends = [
{
"id": "deproxy",
"type": "deproxy",
"port": "8000",
"response": "static",
"response_content": "HTTP/1.1 200 OK\r\n" "Content-Length: 0\r\n\r\n",
},
]
tempesta = dict(
config="""
listen 80;
server ${server_ip}:8000;
access_log dmesg mmap mmap_host=localhost mmap_log=/tmp/access.log;
"""
)
clients = [
{
"id": "deproxy",
"type": "deproxy",
"addr": "${tempesta_ip}",
"port": "80",
}
]
def setUp(self):
super(TestClickhouseLogsBaseTest, self).setUp()
self.clickhouse_client = clickhouse_connect.get_client()
res = self.clickhouse_client.command('exists table access_log')
if res:
self.clickhouse_client.command('delete from access_log where true')
try:
print('removing access log')
tempesta = self.get_tempesta()
res = tempesta.node.run_cmd('rm /tmp/access.log')
assert res == (b'', b'')
except ProcessBadExitStatusException as e:
print(e)
def test_missing_log(self):
self.start_all_servers()
self.start_tempesta()
self.deproxy_manager.start()
self.start_all_clients()
self.assertTrue(self.wait_all_connections())
client = self.get_client('deproxy')
client.start()
client.send_request(
request=client.create_request(method="GET", headers=[]),
expected_status_code='200',
timeout=10,
)
request_in_dmesg = self.oops.find("GET / HTTP/1.1")
assert request_in_dmesg is True
tempesta = self.get_tempesta()
stdout, _ = tempesta.node.run_cmd('cat /tmp/access.log')
assert stdout.endswith(b'Daemon started\n')
res = self.clickhouse_client.query('select * from access_log')
assert len(res.result_rows) == 0
time.sleep(5)
res = self.clickhouse_client.query('select * from access_log')
assert len(res.result_rows) == 0
Configuration file
listen 80;
server ${server_ip}:8000;
access_log dmesg mmap mmap_host=localhost mmap_log=/tmp/access.log;
Version or commit hash 80598c97099132d383426ae45dd45f8bf1bd675d
Stacktrace or debug log Here is the dmesg output for the accepted request
2025-01-06T20:59:52.876152+0100 symstu-TM1613 kernel: [tdb] Start Tempesta DB
2025-01-06T20:59:52.892091+0100 symstu-TM1613 kernel: [tempesta fw] Initializing Tempesta FW kernel module...
2025-01-06T20:59:53.024120+0100 symstu-TM1613 kernel: [tempesta fw] Configuration processing is completed.
2025-01-06T20:59:53.048161+0100 symstu-TM1613 kernel: [tdb] Opened table /opt/tempesta/db/filter0.tdb: size=16777216 rec_size=20 base=00000000a234d73a
2025-01-06T20:59:53.148158+0100 symstu-TM1613 kernel: [tdb] Opened table /opt/tempesta/db/cache0.tdb: size=268435456 rec_size=0 base=00000000a2343f3f
2025-01-06T20:59:53.156157+0100 symstu-TM1613 kernel: [tdb] Opened table /opt/tempesta/db/sessions0.tdb: size=16777216 rec_size=312 base=000000002fd3ede7
2025-01-06T20:59:53.164205+0100 symstu-TM1613 kernel: [tdb] Opened table /opt/tempesta/db/client0.tdb: size=16777216 rec_size=624 base=00000000108cdf2b
2025-01-06T20:59:53.196180+0100 symstu-TM1613 kernel: [tempesta fw] Open listen socket on: 0.0.0.0
2025-01-06T20:59:53.212126+0100 symstu-TM1613 kernel: [tempesta fw] Tempesta FW is ready
2025-01-06T20:59:53.232107+0100 symstu-TM1613 kernel: [tempesta fw] 127.0.0.2 "default" "GET / HTTP/1.1" 200 0 "-" "-"
and here is the access.log for the tfw_logger
[2025-01-06 20:59:53.212] [access_logger] [info] Starting daemon...
[2025-01-06 20:59:53.213] [access_logger] [info] Daemon started
[2025-01-06 20:59:58.280] [access_logger] [info] Stopping daemon...
[2025-01-06 20:59:58.289] [access_logger] [info] Daemon stopped
As you can see, the request was registered at 2025-01-06 20:59:53.232107, while the logger daemon finished startup at 2025-01-06 20:59:53.213. This indicates that the daemon was already running when the request was registered. To address this, you can add a time.sleep() call for at least 1 second before the test client sends the request, which should prevent this bug from occurring.
I think it's better to remove parsing this options from tesmpest.sh. Implement separate configuration file for logger and use boost to parse it.
@MorganaFuture could you please describe what was the reason for the issue and which change exactly in the PR has resolved it. I'm afraid that the test could just not reveal the issue, e.g. if it's not 100% reporoducable.
The bug no longer reproduces consistently. The test still crashes occasionally, but the record always exists in ClickHouse.
Moreover, the time between Tempesta FW startup and the log entry appearing is very short — around 0.05 seconds. I'm not sure the logs were actually present when I originally observed the issue.
At this point, it seems more like a timing lag in ClickHouse between data being written and becoming queryable
The bug still exists, and the probability of encountering it is higher on fast machines (I reproduced it on my local laptop, but on CI it occurs about 95% of the time).