openvpn icon indicating copy to clipboard operation
openvpn copied to clipboard

management interface: Reverted management_io to its original state to prevent the following problem:

Open amatveyev-gh opened this issue 9 years ago • 8 comments

man_read -> ... -> man_output_list_push_finalize -> management_io -> man_read

After some digging I found this commit made by James Yonan in 2010: https://sourceforge.net/p/openvpn/openvpn-testing/ci/3cf6c9328250061600b78c8a7deb0edc850e739b

In this commit he put both writing to and reading from the socket into the same "if" branch, which leads to the problem I've mentioned above. man_read after having been called recursively goes totally nuts and screws its input buffer, as result, openvpn can processes the same command several times and then fail to read the rest of them correctly.

I'm not sure that what I did here is the best way to solve this problem, but it seems to work for me.

amatveyev-gh avatar May 17 '16 19:05 amatveyev-gh

Here is a small testcase I used to reproduce and debug the problem:

  1. run openvpn.exe, obviously. I ran it with --management-hold to make it wait for input, but I don't think it matters
  2. Use this little program to send a bunch of commands quickly. In my case it's just a series of alternating "state on" and "state of".
#include <stdio.h>
#include <stdlib.h>
#include <winsock2.h>

#pragma comment(lib,"ws2_32.lib") //Winsock

int main(int argc , char *argv[])
{
    WSADATA wsa;
    SOCKET s;
    struct sockaddr_in server;
    char buf[256] = { 0 };
    int received;
    int counter;

    WSAStartup(MAKEWORD(2,2), &wsa);

    s = socket(AF_INET, SOCK_STREAM, 0 );

    server.sin_addr.s_addr = inet_addr("127.0.0.1");
    server.sin_family = AF_INET;
    server.sin_port = htons(12345);

    connect(s, (struct sockaddr *)&server, sizeof(server));

    recv(s, buf, sizeof(buf), 0);
    // For some reason, without this line the bug doesn't reproduce on my machine
    printf("Received %s\n", buf); 

    for (counter = 0; counter < 10; ++counter)
    {
        const char* on = "state on\n";
        const char* off = "state off\n";
        const char* ptr;
        int result;
        if (counter % 2)
            ptr = on;
        else
            ptr = off;
        result = send(s, ptr, strlen(ptr), 0);
        printf("%d bytes were sent\n", result);
    }

    Sleep(1000*1000); // need this for debugging

    shutdown(s, SD_BOTH);
    closesocket(s);
    WSACleanup();

    return 0;
}
  1. See something like this: bug

Of course, as with any race condition, your ability to reproduce this bug might depend on your particular machine and/or luck.

amatveyev-gh avatar May 18 '16 07:05 amatveyev-gh

And in terms of code the problem looks like this:

002fefd0 0126d7a9 openvpn!man_read+0x27 002fefe8 01271ef8 openvpn!management_io+0xc9 002feff4 0126b194 openvpn!man_output_standalone+0x28 002ff010 0126af4b openvpn!man_output_list_push_finalize+0x54 002ff054 0125201c openvpn!virtual_output_callback_func+0x17b 002ff06c 01251db3 openvpn!virtual_output_print+0x1c 002ff0c8 01251b62 openvpn!x_msg_va+0x233 002ff0e0 01270621 openvpn!x_msg+0x22 002ff124 0127086a openvpn!man_history+0xe1 002ff144 0126ea1b openvpn!man_state+0x2a 002ff17c 0126e06a openvpn!man_dispatch_command+0x7ab 002ff1fc 0126dabc openvpn!man_process_command+0x18a 002ff32c 0126d7a9 openvpn!man_read+0x16c 002ff344 01272358 openvpn!management_io+0xc9

It goes like this:

  1. openvpn receives the first "state off" and starts processing it. During processing it modifies the input buffer (man->connection.in): '\n' is removed and '\0' is added at the end of the command. But the man->connection.in->buf.offset is still 0, it points to the beginning of the buffer.
  2. While processing the command, openvpn wants to write something and goes to management_io
  3. management_io says: 'Ok, I see you want to write, but you know what, why don't you also read something and try to process it, so go to man_read'
  4. We get to man_read again, and read a few more commands that came in while we were processing the first one. But here is the thing: the input buffer is in inconsistent state, and after we add new commands to it it looks like this: "state off\0state on\n" and the offset is still 0. How many commands do we have here? I guess you can get the rest of the story from here.

amatveyev-gh avatar May 18 '16 08:05 amatveyev-gh

Hi,

On Tue, May 17, 2016 at 12:13:04PM -0700, Alexander Matveyev wrote:

I'm not sure that what I did here is the best way to solve this problem, but it seems to work for me.

Thanks for the patch and the detailed description. I've asked James and Arne to review the issue because they know the management code best (and James is the author of the original offending commit).

gert

USENET is not the non-clickable part of WWW! //www.muc.de/~gert/ Gert Doering - Munich, Germany [email protected] fax: +49-89-35655025 [email protected]

cron2 avatar May 18 '16 09:05 cron2

Notifying @jamesyonan

mattock avatar May 19 '16 07:05 mattock

Aaand, of course I made a stupid mistake when fixing the bug in my local fork. One closing '}' brace is in the wrong place. I debugged and worked on one source tree and sent you changes from another, typical me :) It doesn't change anything about the bug description, just disregard my patch, the fix is trivial anyway, you can easily do it from scratch or even decide to fix the problem in a totally different place.

amatveyev-gh avatar May 23 '16 12:05 amatveyev-gh

@schwabe is this still an issue?

ordex avatar Sep 16 '22 18:09 ordex

@selvanair might also have some insights here, having fixed one of the management recursion issues recently...

cron2 avatar Sep 17 '22 12:09 cron2

This does look related to the bug for which we merged a quick-fix (allow a few levels of recursion in virtual_output_callback(): commit 4dfd592).

However, I can't reproduce the behaviour reported here. I tried with and without my patch: in both cases repeatedly sending "state on/off" as suggested here works correctly. Also checked that the daemon does respond with "SUCCESS: " for each command (tested on Linux). This is not definitive as such race-condition/timing related bugs are hard to reproduce.

@amatveyev-gh could you test again whether this is still an issue? I'm also curious to know whether the fix proposed here could also handle the lost messages problem without allowing recursion.

In the mean time I stumbled up on commit b2b6617 which seems to indicate that allowing recursion as I did may have undesirable side effects.

@dsommers Does commit 4dfd592 break your bug-fix in commit b2b6617 ?

selvanair avatar Sep 17 '22 18:09 selvanair