ZeroTierOne
ZeroTierOne copied to clipboard
Problem with metrics in daemonize (-d) mode
Hello, I have found a problem with ZeroTier metrics when zerotier-one daemon runs with the -d parameter.
Here is my current configuration:
ZeroTier Version: 1.12.2
OS: AL2 4.14.336-257.562.amzn2.x86_64
ZeroTier was installed via the standard method:
curl -s https://install.zerotier.com | sudo bash
systemctl start zerotier-one
The metrics file is empty:
[root@ip-10-111-248-242 ~]# ls -lah /var/lib/zerotier-one/metrics*
-rwxrwxrwx 1 zerotier-one zerotier-one 0 May 2 13:47 /var/lib/zerotier-one/metrics.prom
-rw------- 1 zerotier-one zerotier-one 24 Mar 21 07:53 /var/lib/zerotier-one/metricstoken.secret
You can also reproduce this issue simply by running /usr/sbin/zerotier-one -d
Upon investigating further, I found that on AL2, you are using an init.d script that runs zerotier-one with the -d parameter. This flag initiates the fork() method and shuts down the main process as seen here: https://github.com/zerotier/ZeroTierOne/blob/a681fbf5337e63721f869cebae12d5e1a92f1238/one.cpp#L2300
This action triggers the execution of the destructor of the SaveToFile class, defined here: https://github.com/zerotier/ZeroTierOne/blob/a681fbf5337e63721f869cebae12d5e1a92f1238/ext/prometheus-cpp-lite-1.0/core/include/prometheus/save_to_file.h#L53
The execution of the destructor sets the global variable must_die to true. Since you are using fork() syscall, all processes share the same memory, so the worker_function thread stops before any metrics can be written.
Here is an example of a GDB session:
gdb --args zerotier-one -d
GNU gdb (GDB) Red Hat Enterprise Linux 8.0.1-36.amzn2.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from zerotier-one...(no debugging symbols found)...done.
(gdb) b prometheus::SaveToFile::~SaveToFile()
Breakpoint 1 at 0x4e3ff0
(gdb) run
Starting program: /usr/sbin/zerotier-one -d
[New LWP 32438]
Detaching after fork from child process 32439.
Thread 1 "zerotier-one" hit Breakpoint 1, 0x00000000004e3ff0 in prometheus::SaveToFile::~SaveToFile() ()
(gdb) Starting Control Plane...
Starting V6 Control Plane...
info threads
Id Target Id Frame
* 1 LWP 32434 "zerotier-one" 0x00000000004e3ff0 in prometheus::SaveToFile::~SaveToFile() ()
2 LWP 32438 "zerotier-one" 0x0000000000e7df06 in sccp ()
(gdb) c
Continuing.
[LWP 32438 exited]
[Inferior 1 (process 32434) exited normally]
(gdb)
The program is not being run.
(gdb)
The program is not being run.
(gdb) b prometheus::SaveToFile::save_data()
Breakpoint 2 at 0x4e6eb0
(gdb) attach 32439
Attaching to program: /usr/sbin/zerotier-one, process 32439
[New LWP 32440]
[New LWP 32441]
[New LWP 32442]
[New LWP 32443]
[New LWP 32444]
[New LWP 32445]
[New LWP 32446]
[New LWP 32447]
[New LWP 32448]
[New LWP 32449]
[New LWP 32450]
[New LWP 32451]
[New LWP 32452]
[New LWP 32453]
[New LWP 32454]
[New LWP 32455]
[New LWP 32456]
[New LWP 32461]
[New LWP 32462]
[New LWP 32463]
[New LWP 32464]
[New LWP 32465]
[New LWP 32466]
0x0000000000e7df06 in sccp ()
(gdb)
(gdb) c
Continuing.
#GDB is in loop here
# list of running processes during GBD session
[root@ip-10-111-248-242 ~]# ps aux | grep zero
root 1296 0.0 0.0 119392 956 pts/1 S+ 14:06 0:00 grep --color=auto zero
root 32418 0.0 1.0 197108 42500 pts/0 S+ 13:47 0:00 gdb --args zerotier-one -d
root 32434 0.0 0.0 16220 2488 pts/0 tl 13:47 0:00 /usr/sbin/zerotier-one -d
zerotie+ 32439 1.8 0.3 49292 12872 pts/0 Sl 13:47 0:22 /usr/sbin/zerotier-one -d
As a result, prometheus::SaveToFile::save_data() would never be called in this case.
Does the http endpoint still work?
Does the http endpoint still work?
It doesn't work
[root@ip-10-111-248-242 ~]# curl -I -XGET -H "X-ZT1-Auth: $(sudo cat /var/lib/zerotier-one/metricstoken.secret)" http://localhost:9993/metrics
HTTP/1.1 200 OK
Content-Length: 0
Content-Type: text/plain
Keep-Alive: timeout=5, max=5
Because if I understand right the file metrics.prom is a source for http endpoint metrics
https://github.com/zerotier/ZeroTierOne/blob/a681fbf5337e63721f869cebae12d5e1a92f1238/service/OneService.cpp#L2268-L2277
hmm does this mean it doesn't work on any type of redhat
Probably, I don't have any other Red Hat nodes for testing. On Ubuntu VMs, we don't have such a problem.