openvpn
openvpn copied to clipboard
management interface: Reverted management_io to its original state to prevent the following problem:
man_read -> ... -> man_output_list_push_finalize -> management_io -> man_read
After some digging I found this commit made by James Yonan in 2010: https://sourceforge.net/p/openvpn/openvpn-testing/ci/3cf6c9328250061600b78c8a7deb0edc850e739b
In this commit he put both writing to and reading from the socket into the same "if" branch, which leads to the problem I've mentioned above. man_read after having been called recursively goes totally nuts and screws its input buffer, as result, openvpn can processes the same command several times and then fail to read the rest of them correctly.
I'm not sure that what I did here is the best way to solve this problem, but it seems to work for me.
Here is a small testcase I used to reproduce and debug the problem:
- run openvpn.exe, obviously. I ran it with --management-hold to make it wait for input, but I don't think it matters
- Use this little program to send a bunch of commands quickly. In my case it's just a series of alternating "state on" and "state of".
#include <stdio.h>
#include <stdlib.h>
#include <winsock2.h>
#pragma comment(lib,"ws2_32.lib") //Winsock
int main(int argc , char *argv[])
{
WSADATA wsa;
SOCKET s;
struct sockaddr_in server;
char buf[256] = { 0 };
int received;
int counter;
WSAStartup(MAKEWORD(2,2), &wsa);
s = socket(AF_INET, SOCK_STREAM, 0 );
server.sin_addr.s_addr = inet_addr("127.0.0.1");
server.sin_family = AF_INET;
server.sin_port = htons(12345);
connect(s, (struct sockaddr *)&server, sizeof(server));
recv(s, buf, sizeof(buf), 0);
// For some reason, without this line the bug doesn't reproduce on my machine
printf("Received %s\n", buf);
for (counter = 0; counter < 10; ++counter)
{
const char* on = "state on\n";
const char* off = "state off\n";
const char* ptr;
int result;
if (counter % 2)
ptr = on;
else
ptr = off;
result = send(s, ptr, strlen(ptr), 0);
printf("%d bytes were sent\n", result);
}
Sleep(1000*1000); // need this for debugging
shutdown(s, SD_BOTH);
closesocket(s);
WSACleanup();
return 0;
}
- See something like this:

Of course, as with any race condition, your ability to reproduce this bug might depend on your particular machine and/or luck.
And in terms of code the problem looks like this:
002fefd0 0126d7a9 openvpn!man_read+0x27 002fefe8 01271ef8 openvpn!management_io+0xc9 002feff4 0126b194 openvpn!man_output_standalone+0x28 002ff010 0126af4b openvpn!man_output_list_push_finalize+0x54 002ff054 0125201c openvpn!virtual_output_callback_func+0x17b 002ff06c 01251db3 openvpn!virtual_output_print+0x1c 002ff0c8 01251b62 openvpn!x_msg_va+0x233 002ff0e0 01270621 openvpn!x_msg+0x22 002ff124 0127086a openvpn!man_history+0xe1 002ff144 0126ea1b openvpn!man_state+0x2a 002ff17c 0126e06a openvpn!man_dispatch_command+0x7ab 002ff1fc 0126dabc openvpn!man_process_command+0x18a 002ff32c 0126d7a9 openvpn!man_read+0x16c 002ff344 01272358 openvpn!management_io+0xc9
It goes like this:
- openvpn receives the first "state off" and starts processing it. During processing it modifies the input buffer (man->connection.in): '\n' is removed and '\0' is added at the end of the command. But the man->connection.in->buf.offset is still 0, it points to the beginning of the buffer.
- While processing the command, openvpn wants to write something and goes to management_io
- management_io says: 'Ok, I see you want to write, but you know what, why don't you also read something and try to process it, so go to man_read'
- We get to man_read again, and read a few more commands that came in while we were processing the first one. But here is the thing: the input buffer is in inconsistent state, and after we add new commands to it it looks like this: "state off\0state on\n" and the offset is still 0. How many commands do we have here? I guess you can get the rest of the story from here.
Hi,
On Tue, May 17, 2016 at 12:13:04PM -0700, Alexander Matveyev wrote:
I'm not sure that what I did here is the best way to solve this problem, but it seems to work for me.
Thanks for the patch and the detailed description. I've asked James and Arne to review the issue because they know the management code best (and James is the author of the original offending commit).
gert
USENET is not the non-clickable part of WWW! //www.muc.de/~gert/ Gert Doering - Munich, Germany [email protected] fax: +49-89-35655025 [email protected]
Notifying @jamesyonan
Aaand, of course I made a stupid mistake when fixing the bug in my local fork. One closing '}' brace is in the wrong place. I debugged and worked on one source tree and sent you changes from another, typical me :) It doesn't change anything about the bug description, just disregard my patch, the fix is trivial anyway, you can easily do it from scratch or even decide to fix the problem in a totally different place.
@schwabe is this still an issue?
@selvanair might also have some insights here, having fixed one of the management recursion issues recently...
This does look related to the bug for which we merged a quick-fix (allow a few levels of recursion in virtual_output_callback(): commit 4dfd592).
However, I can't reproduce the behaviour reported here. I tried with and without my patch: in both cases repeatedly sending "state on/off" as suggested here works correctly. Also checked that the daemon does respond with "SUCCESS: " for each command (tested on Linux). This is not definitive as such race-condition/timing related bugs are hard to reproduce.
@amatveyev-gh could you test again whether this is still an issue? I'm also curious to know whether the fix proposed here could also handle the lost messages problem without allowing recursion.
In the mean time I stumbled up on commit b2b6617 which seems to indicate that allowing recursion as I did may have undesirable side effects.
@dsommers Does commit 4dfd592 break your bug-fix in commit b2b6617 ?