tempesta icon indicating copy to clipboard operation
tempesta copied to clipboard

tfw_logger: missing logs during the first second of application startup

Open symstu-tempesta opened this issue 11 months ago • 1 comments

Describe the issue No logs are present in the ClickHouse database during the first second of application startup, although logs do appear in dmesg.

Expected Behavior Logs are present in both dmesg and the ClickHouse access_log table.

To Reproduce Start Tempesta FW, wait for it to begin responding, and then send a request as soon as possible. I reproduced this with a simple test.

import time
from datetime import datetime

from test_suite import tester
from helpers.error import ProcessBadExitStatusException
import clickhouse_connect

__author__ = "Tempesta Technologies, Inc."
__copyright__ = "Copyright (C) 2018-2025 Tempesta Technologies, Inc."
__license__ = "GPL2"


class TestClickhouseLogsBaseTest(tester.TempestaTest):
    backends = [
        {
            "id": "deproxy",
            "type": "deproxy",
            "port": "8000",
            "response": "static",
            "response_content": "HTTP/1.1 200 OK\r\n" "Content-Length: 0\r\n\r\n",
        },
    ]

    tempesta = dict(
        config="""
            listen 80;
            server ${server_ip}:8000;

            access_log dmesg mmap mmap_host=localhost mmap_log=/tmp/access.log;
        """
    )
    clients = [
        {
            "id": "deproxy",
            "type": "deproxy",
            "addr": "${tempesta_ip}",
            "port": "80",
        }
    ]

    def setUp(self):
        super(TestClickhouseLogsBaseTest, self).setUp()

        self.clickhouse_client = clickhouse_connect.get_client()

        res = self.clickhouse_client.command('exists table access_log')

        if res:
            self.clickhouse_client.command('delete from access_log where true')

        try:
            print('removing access log')
            tempesta = self.get_tempesta()
            res = tempesta.node.run_cmd('rm /tmp/access.log')
            assert res == (b'', b'')

        except ProcessBadExitStatusException as e:
            print(e)

    def test_missing_log(self):
        self.start_all_servers()
        self.start_tempesta()
        self.deproxy_manager.start()
        self.start_all_clients()
        self.assertTrue(self.wait_all_connections())

        client = self.get_client('deproxy')
        client.start()

        client.send_request(
            request=client.create_request(method="GET", headers=[]),
            expected_status_code='200',
            timeout=10,
        )

        request_in_dmesg = self.oops.find("GET / HTTP/1.1")
        assert request_in_dmesg is True

        tempesta = self.get_tempesta()
        stdout, _ = tempesta.node.run_cmd('cat /tmp/access.log')
        assert stdout.endswith(b'Daemon started\n')

        res = self.clickhouse_client.query('select * from access_log')
        assert len(res.result_rows) == 0

        time.sleep(5)
        res = self.clickhouse_client.query('select * from access_log')
        assert len(res.result_rows) == 0

Configuration file

            listen 80;
            server ${server_ip}:8000;

            access_log dmesg mmap mmap_host=localhost mmap_log=/tmp/access.log;

Version or commit hash 80598c97099132d383426ae45dd45f8bf1bd675d

Stacktrace or debug log Here is the dmesg output for the accepted request

2025-01-06T20:59:52.876152+0100 symstu-TM1613 kernel: [tdb] Start Tempesta DB
2025-01-06T20:59:52.892091+0100 symstu-TM1613 kernel: [tempesta fw] Initializing Tempesta FW kernel module...
2025-01-06T20:59:53.024120+0100 symstu-TM1613 kernel: [tempesta fw] Configuration processing is completed.
2025-01-06T20:59:53.048161+0100 symstu-TM1613 kernel: [tdb] Opened table /opt/tempesta/db/filter0.tdb: size=16777216 rec_size=20 base=00000000a234d73a
2025-01-06T20:59:53.148158+0100 symstu-TM1613 kernel: [tdb] Opened table /opt/tempesta/db/cache0.tdb: size=268435456 rec_size=0 base=00000000a2343f3f
2025-01-06T20:59:53.156157+0100 symstu-TM1613 kernel: [tdb] Opened table /opt/tempesta/db/sessions0.tdb: size=16777216 rec_size=312 base=000000002fd3ede7
2025-01-06T20:59:53.164205+0100 symstu-TM1613 kernel: [tdb] Opened table /opt/tempesta/db/client0.tdb: size=16777216 rec_size=624 base=00000000108cdf2b
2025-01-06T20:59:53.196180+0100 symstu-TM1613 kernel: [tempesta fw] Open listen socket on: 0.0.0.0
2025-01-06T20:59:53.212126+0100 symstu-TM1613 kernel: [tempesta fw] Tempesta FW is ready
2025-01-06T20:59:53.232107+0100 symstu-TM1613 kernel: [tempesta fw] 127.0.0.2 "default" "GET / HTTP/1.1" 200 0 "-" "-"

and here is the access.log for the tfw_logger

[2025-01-06 20:59:53.212] [access_logger] [info] Starting daemon...
[2025-01-06 20:59:53.213] [access_logger] [info] Daemon started
[2025-01-06 20:59:58.280] [access_logger] [info] Stopping daemon...
[2025-01-06 20:59:58.289] [access_logger] [info] Daemon stopped

As you can see, the request was registered at 2025-01-06 20:59:53.232107, while the logger daemon finished startup at 2025-01-06 20:59:53.213. This indicates that the daemon was already running when the request was registered. To address this, you can add a time.sleep() call for at least 1 second before the test client sends the request, which should prevent this bug from occurring.

symstu-tempesta avatar Jan 06 '25 20:01 symstu-tempesta

I think it's better to remove parsing this options from tesmpest.sh. Implement separate configuration file for logger and use boost to parse it.

EvgeniiMekhanik avatar Mar 03 '25 17:03 EvgeniiMekhanik

This issue was resolved in this PR. Here are the tests that show the problem no longer exists.

MorganaFuture avatar Jul 09 '25 13:07 MorganaFuture

@MorganaFuture could you please describe what was the reason for the issue and which change exactly in the PR has resolved it. I'm afraid that the test could just not reveal the issue, e.g. if it's not 100% reporoducable.

krizhanovsky avatar Jul 10 '25 11:07 krizhanovsky

The bug no longer reproduces consistently. The test still crashes occasionally, but the record always exists in ClickHouse.

Moreover, the time between Tempesta FW startup and the log entry appearing is very short — around 0.05 seconds. I'm not sure the logs were actually present when I originally observed the issue.

At this point, it seems more like a timing lag in ClickHouse between data being written and becoming queryable

symstu-tempesta avatar Jul 15 '25 18:07 symstu-tempesta

The bug still exists, and the probability of encountering it is higher on fast machines (I reproduced it on my local laptop, but on CI it occurs about 95% of the time).

symstu-tempesta avatar Jul 22 '25 12:07 symstu-tempesta