unit icon indicating copy to clipboard operation
unit copied to clipboard

Socket: replacement of called functions, permissions, reusability

Open echolimazulu opened this issue 2 years ago • 29 comments

Solution to the problem: https://github.com/nginx/unit/issues/669

Repository to reproduce the issue: https://github.com/echolimazulu/unit-bind-aaiu

Hello @VBart, @mar0x, @hongzhidao, @alejandro-colomar, @tippexs.

Please tell me what you think. In the process of testing this fix, I no longer observe the previously encountered problem.

I tried to take into account all the previously voiced comments regarding the code, commits, indents and other points.

I did not propose the implementation of locks, cleaning sockets and locks when the program is closed, cleaning when updating the configuration, since this is a much larger number of changes. Addressing several different issues and extending functionality that is not needed for: https://github.com/nginx/unit/issues/669

echolimazulu avatar Apr 01 '22 16:04 echolimazulu

Hi!

I'm trying to solve this in a much simpler way: https://github.com/nginx/unit/pull/655. I added a commit that I expected should unlink the sockets after use, but I'm getting EACCESS, and my guess is that I'm closing from a thread that doesn't have the privilege to unlink.

Since you have been investigating about this, could you please help me try to fix that commit so that it works in a simpler way? Or maybe you can show me that it can't possibly work...

Thanks,

Alex

alejandro-colomar avatar Jul 18 '22 23:07 alejandro-colomar

Hi!

I'm trying to solve this in a much simpler way: #655. I added a commit that I expected should unlink the sockets after use, but I'm getting EACCESS, and my guess is that I'm closing from a thread that doesn't have the privilege to unlink.

Since you have been investigating about this, could you please help me try to fix that commit so that it works in a simpler way? Or maybe you can show me that it can't possibly work...

Thanks,

Alex

Hello Alex (@alejandro-colomar),

Thank you for your interest in this problem, the alternative solution and the question!

As far as I remember, the Router process does not have enough rights to perform this type of operation. In this regard, in my decision, I proposed an additional handler in the main process, which, through the corresponding RPC exchange between the main process and the router, can receive the task of disposing of the socket when it is closed, similar to the task at the time of socket creation.

If I couldn't answer your question with this, please let me know, I'll try to remember more details and provide them to you.

Always glad to cooperate!

Regards, Evgenii.

echolimazulu avatar Jul 22 '22 19:07 echolimazulu

Hi Evgenii!

Hi! I'm trying to solve this in a much simpler way: #655. I added a commit that I expected should unlink the sockets after use, but I'm getting EACCESS, and my guess is that I'm closing from a thread that doesn't have the privilege to unlink. Since you have been investigating about this, could you please help me try to fix that commit so that it works in a simpler way? Or maybe you can show me that it can't possibly work... Thanks, Alex

Hello Alex (@alejandro-colomar),

Thank you for your interest in this problem, the alternative solution and the question!

As far as I remember, the Router process does not have enough rights to perform this type of operation. In this regard, in my decision, I proposed an additional handler in the main process, which, through the corresponding RPC exchange between the main process and the router, can receive the task of disposing of the socket when it is closed, similar to the task at the time of socket creation.

Yeah, that was my fear.

I'm definitely not happy with your solution (not saying it's bad; it's probably the only solution that fixes the issue).

We've been discussing this internally, and I'm going for an orthogonal solution (imperfect, but orders of magnitude simpler):

Add support for abstract Unix sockets (see the discussion in #735), which currently are only implemented in Linux, but could be added to other kernels (discuss that with your local kernel vendor :P).

Abstract Unix sockets don't live in the file system, so they don't need to be unlink(2)ed (the kernel removes the name when the last sfd to it has been closed, which is what we want here). They also have drawbacks: you can't use filesystem permissions to control who can access the socket; but since we're creating our sockets 666 right now, nothing would really change for us in that regard.

Would a solution that works for Linux only work for you? Which kernel are you using to run Unit?

If I couldn't answer your question with this, please let me know, I'll try to remember more details and provide them to you.

You did. Thanks!

Always glad to cooperate!

:-)

Regards, Evgenii.

Cheers,

Alex

alejandro-colomar avatar Jul 25 '22 09:07 alejandro-colomar

Hello Alex (@alejandro-colomar),

Glad I was able to answer your question!

Would a solution that works for Linux only work for you? Which kernel are you using to run Unit?

I'm using debian11 as the kernel for building unit.

Add support for abstract Unix sockets (see the discussion in https://github.com/nginx/unit/pull/735), which currently are only implemented in Linux, but could be added to other kernels (discuss that with your local kernel vendor :P).

Abstract Unix sockets don't live in the file system, so they don't need to be unlink(2)ed (the kernel removes the name when the last sfd to it has been closed, which is what we want here). They also have drawbacks: you can't use filesystem permissions to control who can access the socket; but since we're creating our sockets 666 right now, nothing would really change for us in that regard.

Thank you for the detailed explanation of how abstract sockets work. I want to note that for me it would be critical to use sockets existing in the file system. For example, in the case of a file socket forwarding scenario inside a Kubernetes cluster between the caching and compressing nginx and the unit behind it. Abstract sockets unfortunately do not allow applications outside of the operating system they reside on to interact with them.

As @VBart previously mentioned, unit already contains or plans to implement said features in the near future. But my planning relies on the technological capabilities already available, which I can already use and which will be enough for me to solve the tasks before me. Even if Unit implements caching and compression, I will lack flexibility when dealing with headers, etc. In this regard, I would not exclude the option of using nginx in conjunction with unit, being limited only to the use of TCP / IP sockets, taking into account the existing technological capabilities of virtualization.

I see unit as a reliable and efficient cross-platform web server running directly in front of the application itself on the same virtualization unit, but is able to communicate effectively in a hyperconverged environment.

This is my vision and use of unit, of course it may differ from the vast majority.

Sorry for highlighting the text, just wanted to emphasize what I think are the most important thoughts and improve readability.

I am always glad to any discussions and work with you and your team.

Regards, Evgenii.

echolimazulu avatar Jul 25 '22 11:07 echolimazulu

Hello Alex (@alejandro-colomar),

Glad I was able to answer your question!

Would a solution that works for Linux only work for you? Which kernel are you using to run Unit?

I'm using debian11 as the kernel for building unit.

Ok, so Linux.

Add support for abstract Unix sockets (see the discussion in #735), which currently are only implemented in Linux, but could be added to other kernels (discuss that with your local kernel vendor :P).

Abstract Unix sockets don't live in the file system, so they don't need to be unlink(2)ed (the kernel removes the name when the last sfd to it has been closed, which is what we want here). They also have drawbacks: you can't use filesystem permissions to control who can access the socket; but since we're creating our sockets 666 right now, nothing would really change for us in that regard.

Thank you for the detailed explanation of how abstract sockets work. I want to note that for me it would be critical to use sockets existing in the file system. For example, in the case of a file socket forwarding scenario inside a Kubernetes cluster between the caching and compressing nginx and the unit behind it. Abstract sockets unfortunately do not allow applications outside of the operating system they reside on to interact with them.

But normal Unix sockets are already restricted to the same host (AFAIK). I don't think there are extra restrictions in this regard by using abstract Unix sockets. Do you have an example application the can work with normal Unix sockets in a multi-host environment that would stop working if the socket was an abstract one?

alejandro-colomar avatar Jul 25 '22 12:07 alejandro-colomar

Usually it is.

I don't understand how you can grant access to connect to an abstract socket outside of the host (virtualization unit) on which it was running. I consider it correct to separate multiple connections to a file socket and actual external access to an abstract socket into different problems. Perhaps I do not know something - I admit it.

With a file socket, this option is definitely possible (shared) and multiple connections to this socket are not required in most standard scenarios, and if it is required and it is technically unavailable (limited), then it is possible to provide multiple connections using a TCP/IP socket.

Earlier, in discussions with @vbart, I said that I do not understand the logic of the F5 Networks & Nginx Inc R&D teams. which boils down to not using file sockets in the Unit product, regardless of the reasons, such as those indicated by you (as I read this): since access is 666 for file sockets - we can only use abstract ones. But no, we can't (in my humble opinion).

If you take the same nginx, from time immemorial it has supported all possible types of connection, not to mention the implementation of a larger number of protocols available in it.

I would like to recall the whole history of the development of this problem:

  • at first I was told (@vbart) that this was not part of the release plan and that file sockets were a legacy from other projects;
  • I tried to help with the implementation - I was told that the solution I provided is too complicated; (@hongzhidao) but acknowledging that it works. Special thanks to @hongzhidao for working on this;
  • the unit team tried to implement a simple solution, but ran into a problem that made my solution so complicated. While recognizing that the solution I provided is probably the only possible one. Special thanks to @alejandro-colomar for working on this.

Also a special thanks to @tippexs for leading the process.

As a result, as I understand it, the unit team came to the question of the need to implement and maintain file sockets in principle as such, which seems to me a little strange, as well as the fact that I repeatedly give detailed usage scenarios (it’s not difficult for me, I’m glad help), which I have never seen from the key developers of unit (@vbart), except for the statements that this is not necessary from real users. Apparently, in their opinion, I am not real or I am not a user of the unit product.

I don’t blame anyone, but from the outside it looks, to put it mildly, like an ordinary sabotage of the development of the project by individual developers and brings them to the decision to refuse to implement file sockets, motivated not by the lack of the possibility of implementation or the presence of real demand.

I did not see answers from @mar0x, @vbart, in this PR, those who participated in the discussions of this problem and its solution at the very beginning and asked the vector, but subsequently completely avoided discussing this problem.

I also want to note that for several months of waiting, I did not wait for the promised answer with the vision of the future Unit project, which @lcrilly promised me by asking me to write a letter to his mail: [email protected], which he kindly provided me earlier and to which I wrote to him on 03/31/2022. Nevertheless, I respect his busyness and understand that my letter could simply be lost.

I want to clarify what I am for the development and bright future of the unit product, which is why I spent a significant amount of time trying to help this project and continue to do so until now. Perhaps I am wasting my time, as I have no desire to resist the overall vision of the unit team's product development, which changes from circumstances unknown to me.

Regardless of the above, I am full of hopes and expectations related to the unit project. I have great respect for absolutely all the contributors of the project and I am glad to interact with each of them regardless of the circumstances, as I believe that the future of the unit project is the key in this whole story and we should probably all concentrate on joining forces when working on it and reaching consensus rather than developing disagreements on it.

echolimazulu avatar Jul 25 '22 13:07 echolimazulu

I'm not saying we're not going to implement Unix sockets. We are. And I'm about to commit the changes into the master branch (I wanted to clear up a few discussions before that, but the code is ready).

What I'm saying is that I want to support both variants of the Unix domain socket: filesystem and abstract.

Both are equivalent from a user point of view: filesystem sockets would be used as "unix:/path/to/file", and abstract sockets would be used as "unix:@socket/name". Notice the initial @ in the pathname; that's literally the only difference you'll ever notice.

The other difference between filesystem sockets and abstract sockets is that with filesystem sockets you'll have the problem that your filesystem will have files that you'll have to clean up manually, while the abstract sockets will be cleaned up by the kernel automatically (a good thing).

You sholdn't notice any other differences (as long as the application that wants to talk to the socket also supports abstract sockets; NGINX still doesn't but I'm discussing with them about the possibility of adding support for it).

alejandro-colomar avatar Jul 25 '22 14:07 alejandro-colomar

Thanks for your detailed answer.

Sorry, maybe I misread or misunderstood your answer (I fully admit it).

But I think that you and I understand perfectly well that the implementation of abstract and file sockets you specified is available for the most part at the current time (only through \0 at the beginning of the path), and the usage scenario you described certainly expands the possibilities of current use, but does not solves the main problem with cleaning up when the program ends and the file socket is removed, which is in fact the motivator of this PR, since it is not possible to restart the unit without removing the file socket.

My question to the implementation of file sockets you indicated at the moment is why the user needs to separately manually determine where the abandoned file socket was, track the state of the unit and delete it after the unit automatically created this file socket at startup and simply abandoned it at the end of the program? Moreover, when not deleting the file socket, the unit cannot be restarted.

echolimazulu avatar Jul 25 '22 14:07 echolimazulu

I didn't solve the problem with Unix domain filesystem sockets because:

  • Unix domain abstract sockets don't have that problem, so you can switch to using them, and the problem will go away.
  • If you have an environment (in Linux) where Unix filesystem sockets work but Unix abstract sockets don't work, please show it to me, with actual commands that show the situation, to be able to debug it. I suspect there's no such scenario.
  • Even if you do need filesystem sockets (e.g., on FreeBSD), you can use the same rc script that you use to run unitd, to also clean up any sockets. In fact, in Debian, you have systemd, which solves this easily: see systemd-tmpfiles(8).

So I don't see much benefit in cleaning the files within unit, considering the complexity it imposes, and the available alternatives. It's not fully discarded, but I don't like the idea.

alejandro-colomar avatar Jul 25 '22 14:07 alejandro-colomar

I didn't solve the problem with Unix domain filesystem sockets because:

  • Unix domain abstract sockets don't have that problem, so you can switch to using them, and the problem will go away.
  • If you have an environment (in Linux) where Unix filesystem sockets work but Unix abstract sockets don't work, please show it to me, with actual commands that show the situation, to be able to debug it. I suspect there's no such scenario.
  • Even if you do need filesystem sockets (e.g., on FreeBSD), you can use the same rc script that you use to run unitd, to also clean up any sockets. In fact, in Debian, you have systemd, which solves this easily: see systemd-tmpfiles(8).

So I don't see much benefit in cleaning the files within unit, considering the complexity it imposes, and the available alternatives. It's not fully discarded, but I don't like the idea.

Thanks for the options provided.

Unix domain abstract sockets don't have that problem, so you can switch to using them, and the problem will go away.

Please forget about abstract sockets in the context of this PR, I didn't mention this question at all until you first did it.

If you have an environment (in Linux) where Unix filesystem sockets work but Unix abstract sockets don't work, please show it to me, with actual commands that show the situation, to be able to debug it. I suspect there's no such scenario.

You are asking to show the environment that I have already described at least 2 times before (in this PR and before that in a discussion with @vbart in another PR). I'm talking about two Pods with roles (nginx and unit) inside a Kubernetes cluster with file socket communication (shared). How can you implement this exchange (communication between two applications) through an abstract socket? Please explain this to me?

Even if you do need filesystem sockets (e.g., on FreeBSD), you can use the same rc script that you use to run unitd, to also clean up any sockets. In fact, in Debian, you have systemd, which solves this easily: see systemd-tmpfiles(8).

I'm relying on Google's advice, which is a good reflection of why using managers like systemd, supervisord, and shells like !/bin/sh is undesirable, and that it actually breaks the basic principle of how containerized applications work: https://cloud.google.com/architecture/best-practices-for-building-containers

Moreover, I don't use FreeBSD, and neither do most users of your product.

So I don't see much benefit in cleaning the files within unit, considering the complexity it imposes, and the available alternatives. It's not fully discarded, but I don't like the idea.

I think that it doesn't matter what we personally like and what we don't when we talk about the needs of the business expressed in the requirements for the product that the end user needs.

echolimazulu avatar Jul 25 '22 16:07 echolimazulu

How can you implement this exchange (communication between two applications) through an abstract socket? Please explain this to me?

Exactly as you would with unix sockets. See an example here. Please feel free to ask any questions if you don't understand something.

alx@asus5775:~/tmp$ cat srv_a.c 
#include <err.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <unistd.h>


#define MY_SOCK_NAME       "@magic"
#define MY_LISTEN_BACKLOG  50


int main(void)
{
	int                 sfd, cfd;
	char                buf[7];
	socklen_t           len;
	struct sockaddr_un  sun = {0}, cun;

	sfd = socket(AF_UNIX, SOCK_STREAM, 0);
	if (sfd == -1)
		err(EXIT_FAILURE, "socket(2)");

	sun.sun_family = AF_UNIX;
	strncpy(sun.sun_path, MY_SOCK_NAME, sizeof(sun.sun_path) - 1);
	sun.sun_path[0] = '\0';

	len = sizeof(sa_family_t) + strlen(MY_SOCK_NAME);
	if (bind(sfd, (struct sockaddr *) &sun, len) == 1)
		err(EXIT_FAILURE, "bind(2)");

	if (listen(sfd, MY_LISTEN_BACKLOG) == -1)
		err(EXIT_FAILURE, "listen(2)");

	len = sizeof(cun);
	cfd = accept(sfd, (struct sockaddr *) &cun, &len);
	if (cfd == -1)
		err(EXIT_FAILURE, "accept(2)");

	if (read(cfd, buf, sizeof(buf)) != sizeof(buf))
		err(EXIT_FAILURE, "read(2)1");

	if (write(STDOUT_FILENO, buf, sizeof(buf)) == -1)
		err(EXIT_FAILURE, "write(2)");

	if (read(cfd, buf, 1) != 0)
		err(EXIT_FAILURE, "read(2)2");

	if (close(cfd) == -1)
		err(EXIT_FAILURE, "close(cfd)");

	if (close(sfd) == -1)
		err(EXIT_FAILURE, "close(sfd)");

	exit(EXIT_SUCCESS);
}
alx@asus5775:~/tmp$ cat cli_a.c 
#include <err.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <unistd.h>


#define MY_SOCK_NAME       "@magic"


int main(void)
{
	int                 sfd;
	socklen_t           len;
	struct sockaddr_un  sun = {0};

	sfd = socket(AF_UNIX, SOCK_STREAM, 0);
	if (sfd == -1)
		err(EXIT_FAILURE, "socket(2)");

	sun.sun_family = AF_UNIX;
	strncpy(sun.sun_path, MY_SOCK_NAME, sizeof(sun.sun_path) - 1);
	sun.sun_path[0] = '\0';

	len = sizeof(sa_family_t) + strlen(MY_SOCK_NAME);
	if (connect(sfd, (struct sockaddr *) &sun, len) == 1)
		err(EXIT_FAILURE, "connect(2)");

	if (write(sfd, "hello\n", 7) != 7)
		err(EXIT_FAILURE, "write(2)");

	if (close(sfd) == -1)
		err(EXIT_FAILURE, "close(sfd)");

	exit(EXIT_SUCCESS);
}
alx@asus5775:~/tmp$ cc -Wall -Wextra srv_a.c -o srv_a
alx@asus5775:~/tmp$ cc -Wall -Wextra cli_a.c -o cli_a
alx@asus5775:~/tmp$ ./srv_a &
[1] 2885
alx@asus5775:~/tmp$ ./cli_a 
hello
[1]+  Done                    ./srv_a

(Thanks to @ac000, who pointed out a bug in the above program. I fixed it.)

alejandro-colomar avatar Jul 25 '22 17:07 alejandro-colomar

How can you implement this exchange (communication between two applications) through an abstract socket? Please explain this to me?

Exactly as you would with unix sockets. See an example here. Please feel free to ask any questions if you don't understand something.

alx@asus5775:~/tmp$ cat srv_a.c 
#include <err.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <unistd.h>


#define MY_SOCK_NAME       "\0magic"
#define MY_LISTEN_BACKLOG  50


int main(void)
{
	int                 sfd, cfd;
	char                buf[7];
	socklen_t           len;
	struct sockaddr_un  sun = {0}, cun;

	sfd = socket(AF_UNIX, SOCK_STREAM, 0);
	if (sfd == -1)
		err(EXIT_FAILURE, "socket(2)");

	sun.sun_family = AF_UNIX;
	strncpy(sun.sun_path, MY_SOCK_NAME, sizeof(sun.sun_path) - 1);

	if (bind(sfd, (struct sockaddr *) &sun, sizeof(sun)) == 1)
		err(EXIT_FAILURE, "bind(2)");

	if (listen(sfd, MY_LISTEN_BACKLOG) == -1)
		err(EXIT_FAILURE, "listen(2)");

	len = sizeof(cun);
	cfd = accept(sfd, (struct sockaddr *) &cun, &len);
	if (cfd == -1)
		err(EXIT_FAILURE, "accept(2)");

	if (read(cfd, buf, sizeof(buf)) != sizeof(buf))
		err(EXIT_FAILURE, "read(2)1");

	if (write(STDOUT_FILENO, buf, sizeof(buf)) == -1)
		err(EXIT_FAILURE, "write(2)");

	if (read(cfd, buf, 1) != 0)
		err(EXIT_FAILURE, "read(2)2");

	if (close(cfd) == -1)
		err(EXIT_FAILURE, "close(cfd)");

	if (close(sfd) == -1)
		err(EXIT_FAILURE, "close(sfd)");

	exit(EXIT_SUCCESS);
}
alx@asus5775:~/tmp$ cat cli_a.c 
#include <err.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <unistd.h>


#define MY_SOCK_NAME       "\0magic"


int main(void)
{
	int                 sfd;
	struct sockaddr_un  sun = {0};

	sfd = socket(AF_UNIX, SOCK_STREAM, 0);
	if (sfd == -1)
		err(EXIT_FAILURE, "socket(2)");

	sun.sun_family = AF_UNIX;
	strncpy(sun.sun_path, MY_SOCK_NAME, sizeof(sun.sun_path) - 1);

	if (connect(sfd, (struct sockaddr *) &sun, sizeof(sun)) == 1)
		err(EXIT_FAILURE, "connect(2)");

	if (write(sfd, "hello\n", 7) != 7)
		err(EXIT_FAILURE, "write(2)");

	if (close(sfd) == -1)
		err(EXIT_FAILURE, "close(sfd)");

	exit(EXIT_SUCCESS);
}
alx@asus5775:~/tmp$ cc -Wall -Wextra srv_a.c -o srv_a
alx@asus5775:~/tmp$ cc -Wall -Wextra cli_a.c -o cli_a
alx@asus5775:~/tmp$ ./srv_a &
[1] 2885
alx@asus5775:~/tmp$ ./cli_a 
hello
[1]+  Done                    ./srv_a

Thanks for your explanation.

But in the previous sentence you refer to, I gave a specific example that involves interacting with a socket from two independent containers.

I'm talking about two Pods with roles (nginx and unit) inside a Kubernetes cluster with file socket communication (shared). How can you implement this exchange (communication between two applications) through an abstract socket? Please explain this to me?

As far as I remember, the abstract socket is only suitable for interprocessor communication within the same host (container). Please take a look at the schematic I am attaching to this post. This is a much more accurate and detailed use case I'm talking about:

kubernetes-nginx-unit .

echolimazulu avatar Jul 25 '22 19:07 echolimazulu

But in the previous sentence you refer to, I gave a specific example that involves interacting with a socket from two independent containers.

I'm talking about two Pods with roles (nginx and unit) inside a Kubernetes cluster with file socket communication (shared). How can you implement this exchange (communication between two applications) through an abstract socket? Please explain this to me?

As far as I remember, the abstract socket is only suitable for interprocessor communication within the same host (container). Please take a look at the schematic I am attaching to this post. This is a much more accurate and detailed use case I'm talking about:

kubernetes-nginx-unit .

That picture helps a lot.

So, I've been testing isolating my server and client programs in Linux namespaces, and I found that the only thing that causes the socket to be invisible is having both processes in different network namespaces.

That's consistent with what network_namespaces(7) describes:

DESCRIPTION
       Network  namespaces  provide  isolation of the system re‐
       sources associated with networking: network devices, IPv4
       and IPv6 protocol stacks,  IP  routing  tables,  firewall
       rules,  the /proc/net directory (which is a symbolic link
       to /proc/PID/net), the /sys/class/net directory,  various
       files under /proc/sys/net, port numbers (sockets), and so
       on.  In addition, network namespaces isolate the UNIX do‐
       main abstract socket namespace (see unix(7)).

Are both pods in the same network Linux namespace? If not, is it something that you can change?

alx@asus5775:~/tmp$ unshare -i -m -p -u -U -C -T -f ./srv_a &
[1] 2862
alx@asus5775:~/tmp$ unshare -i -m -p -u -U -C -T -f ./cli_a 
hello
[1]+  Done                    unshare -i -m -p -u -U -C -T -f ./srv_a
alx@asus5775:~/tmp$ sudo unshare -n -f ./srv_a &
[1] 2872
alx@asus5775:~/tmp$ sudo unshare -n -f ./cli_a 
cli_a: write(2): Transport endpoint is not connected

(The following sentence was incorrect, after investigation. Different pods are in different net namespaces, and although that can be relaxed, it would be a security issue.) As far as I know, pods in the same node share the network namespace, but I'm not sure about that.

alejandro-colomar avatar Jul 25 '22 23:07 alejandro-colomar

As far as I know, pods in the same node share the network namespace, but I'm not sure about that

Hello Alex,

I apologize for the wording "interprocessor communication" - I meant communication between two processes within the same host. I formulated it poorly, since I wrote the answer and drew the picture late at night.

As far as I remember, the attached filespace driver (attached file system, shared) is from the Kubernetes snap-in and is not connected to the network in any way. This works in much the same way as mounted volumes in Docker, although in my humble opinion it is more efficient.

echolimazulu avatar Jul 26 '22 05:07 echolimazulu

As far as I know, pods in the same node share the network namespace, but I'm not sure about that

Hello Alex,

I apologize for the wording "interprocessor communication" - I meant communication between two processes within the same host. I formulated it poorly, since I wrote the answer and drew the picture late at night.

As far as I remember, the attached filespace driver (attached file system, shared) is from the Kubernetes snap-in and is not connected to the network in any way. This works in much the same way as mounted volumes in Docker, although in my humble opinion it is more efficient.

Don't worry, with the picture I got a perfect idea of what you have.

So, I've been investigating about kubernetes, and you're right, pods are isolated from each other regarding network (kubernetes implements this by puting the processes running in them in diferent Linux network namespaces).

So, if you have nginx and unit in different pods, it's impossible to use abstract sockets.

But if you put the two containers in the same pod (that's possible), they will share network namespace, and they'll be able to communicate that way between them.

Would you be able to use a single pod with two containers?

As you've explained it to me, it seems that the relation of nginx containers to unit containers will be 1:1, right? If that's the case, using the same pod would make sense in terms of scalability, since you could just increase the multi-container pod instances easily. It would also remove the need for using a shared filesystem mountpoint. I think it would be even better in terms of isolation.

alejandro-colomar avatar Jul 26 '22 08:07 alejandro-colomar

@echolimazulu good so see you again hanging around! Thank you very much for all the work you put into this topic! We really appreciate it!

The final infrastructure is clear to me now but could you explain whats the use case for the NGINX (OSS) Proxy in front of the Unit instance? Just want to understand what feature is missing. Looking forward to your answer. Have a great day! Cheers Timo

tippexs avatar Jul 26 '22 09:07 tippexs

@echolimazulu good so see you again hanging around! Thank you very much for all the work you put into this topic! We really appreciate it!

The final infrastructure is clear to me now but could you explain whats the use case for the NGINX (OSS) Proxy in front of the Unit instance? Just want to understand what feature is missing. Looking forward to your answer. Have a great day! Cheers Timo

Hello @tippexs,

Thank you for your attention to this issue and for joining the discussion with @alejandro-colomar and @hongzhidao. Glad to work with you all on this issue!

Specifically, I use and I really like the way it is implemented in nginx:

  • caching is a very flexible tool and super efficient in nginx;
  • compression - I use brotli and gzip;
  • headers - I am very comfortable working with headers in nginx, it works without exceptions and restrictions in use and without known problems in passing to the upstream balancer (proxy - if available), especially due to the fact that they are one and the same level with the role of caching and allows you to fully manage this process in one place if this requires the use of additional headers;
  • envsubst - I have my own implementation written by me in C of this feature, an approximate analogue of !/bin/sh, only without using the shell, but with the same principle of transferring control over the process and the environment after preliminary configuration. In unit (if it's up front) I use a pre-configured file at startup, which may not always be convenient if virtualization manages the port and changes it (support for this is necessary from Google's point of view). The option of launching and sending a configuration request (curl) to a configuration socket seems a little redundant and non-atomic to me compared to a pre-checked and prepared file (yes, possibly with a port change on the fly - interpolation).

echolimazulu avatar Jul 27 '22 04:07 echolimazulu

As far as I know, pods in the same node share the network namespace, but I'm not sure about that

Hello Alex, I apologize for the wording "interprocessor communication" - I meant communication between two processes within the same host. I formulated it poorly, since I wrote the answer and drew the picture late at night. As far as I remember, the attached filespace driver (attached file system, shared) is from the Kubernetes snap-in and is not connected to the network in any way. This works in much the same way as mounted volumes in Docker, although in my humble opinion it is more efficient.

Don't worry, with the picture I got a perfect idea of what you have.

So, I've been investigating about kubernetes, and you're right, pods are isolated from each other regarding network (kubernetes implements this by puting the processes running in them in diferent Linux network namespaces).

So, if you have nginx and unit in different pods, it's impossible to use abstract sockets.

But if you put the two containers in the same pod (that's possible), they will share network namespace, and they'll be able to communicate that way between them.

Would you be able to use a single pod with two containers?

As you've explained it to me, it seems that the relation of nginx containers to unit containers will be 1:1, right? If that's the case, using the same pod would make sense in terms of scalability, since you could just increase the multi-container pod instances easily. It would also remove the need for using a shared filesystem mountpoint. I think it would be even better in terms of isolation.

Hello @alejandro-colomar,

Thank you for taking the time to explore Kubernetes features and suggestions.

Unfortunately, the abstraction option you described: two programs (nginx, unit) in one container violate the general principle of correct virtualization. This is due to several reasons, ranging from zombie processes when using various shells, service managers, to the inability to correctly pass SIGKILL to the end application and get the necessary STDOUT / STDERR output from the end application in the container.

Here I give my preference and follow Google's guidelines for Kubernetes containers, namely: single app per container https://cloud.google.com/architecture/best-practices-for-building-containers#package_a_single_app_per_container

As for scalability, I prefer to consider its possibility at the Node abstraction level, rather than the Pod, through the creation of so-called Node pools providing redundancy and fault tolerance if necessary.

echolimazulu avatar Jul 27 '22 06:07 echolimazulu

But if you put the two containers in the same pod [...] they'll be able to communicate that way between them. Would you be able to use a single pod with two containers?

Hello @alejandro-colomar,

Thank you for taking the time to explore Kubernetes features and suggestions.

Unfortunately, the abstraction option you described: two programs (nginx, unit) in one container violate the general principle of correct virtualization. This is due to several reasons, ranging from zombie processes when using various shells, service managers, to the inability to correctly pass SIGKILL to the end application and get the necessary STDOUT / STDERR output from the end application in the container.

Dear Evgenii,

I totally agree with you that placing two processes in the same container would be a terrible idea. I am suggesting instead a totally different solution: placing one single process per container, but two containers in the same pod. As you may already know, a pod is a group of containers that share the same network namespace (they are isolated as a whole, so no networking to the outside, but they can network between them).

Thank you very much.

Here I give my preference and follow Google's guidelines for Kubernetes containers, namely: single app per container https://cloud.google.com/architecture/best-practices-for-building-containers#package_a_single_app_per_container

As for scalability, I prefer to consider its possibility at the Node abstraction level, rather than the Pod, through the creation of so-called Node pools providing redundancy and fault tolerance if necessary.

alejandro-colomar avatar Jul 27 '22 08:07 alejandro-colomar

I'm referring to this:

https://www.mirantis.com/blog/multi-container-pods-and-container-communication-in-kubernetes/

which you'll notice that mentions as a use case having nginx as a reverse proxy before the actual app (unit in this case).

alejandro-colomar avatar Jul 27 '22 08:07 alejandro-colomar

With my srv_a and cli_a programs, the YAML would look something like the following:

apiVersion: v1
kind: Pod
metadata:
  name: server_and_client
spec:
  containers:
  - name: server
    image: srv_a
    command: ["/bin/sh", "-c"]
    args:
      - /usr/local/bin/srv_a
  - name: client
    image: cli_a
    command: ["/bin/sh", "-c"]
    args:
      - sleep 10 && /usr/local/bin/cli_a

alejandro-colomar avatar Jul 27 '22 08:07 alejandro-colomar

I'm referring to this:

https://www.mirantis.com/blog/multi-container-pods-and-container-communication-in-kubernetes/

which you'll notice that mentions as a use case having nginx as a reverse proxy before the actual app (unit in this case).

Wow, Alex (@alejandro-colomar), this is incredibly useful information, sincerely thank you for this! And yes, it seems to me that this will allow the use of a common namespace to work with abstract sockets. I'll try this out in the future, after full abstract socket support is added to the main branch by you (probably already). But as you very well noted earlier, not all applications can support abstract sockets, and not everyone uses kubernetes as a flexible virtualization tool.

I overlooked this, since I mainly work in abstraction: Node -> Pod = 1 container and have not come across the principle of work you indicated in practice, so my knowledge of this will not allow an objective discussion on the pros and cons of this solution at the moment time. Most likely everything will be fine. Of the minuses, only an increase in the number of attack vectors due to the use of shared memory is likely. Therefore, for myself, I personally accept this option and with a high probability I will use it in my own solutions, as the most efficient current solution, provided that abstract sockets are used.

But the main guest in this PR is still file sockets, which exist almost everywhere and are understood and known to very many people. I would say that this is already a generally accepted standard. Which cannot be completely discarded or abandoned by referring to abstract sockets (which are not always and everywhere possible to use) or the legacy of the project.

I also know of people who use these roles (nginx, and others) on separate virtual machines under ESXi, which, as far as I remember, do not have a common IPC.

If we abandon file sockets altogether, then in my humble opinion it looks like a degradation of the project.

If we leave everything as it is and file sockets have to be deleted manually, after automatic creation at the time of unit startup - then this is also, in my humble opinion, a degradation of the project.

In fact, when we talk about abstract sockets in light of the work in progress for file sockets, we are trying to find a workaround, not a real solution to the problem of closing file sockets.

I am sincerely glad and grateful to you personally and @tippexs for throwing options to find a solution, because this is how we can come to a successful and best solution to this problem.

Regards, Evgenii.

echolimazulu avatar Jul 27 '22 12:07 echolimazulu

I'm referring to this: https://www.mirantis.com/blog/multi-container-pods-and-container-communication-in-kubernetes/ which you'll notice that mentions as a use case having nginx as a reverse proxy before the actual app (unit in this case).

Wow, Alex (@alejandro-colomar), this is incredibly useful information, sincerely thank you for this! And yes, it seems to me that this will allow the use of a common namespace to work with abstract sockets. I'll try this out in the future, after full abstract socket support is added to the main branch by you (probably already).

I'm happy that this is so useful to you! :-)

But as you very well noted earlier, not all applications can support abstract sockets, and not everyone uses kubernetes as a flexible virtualization tool.

Yeah, I know there are scenarios where file sockets might be necessary.

I overlooked this, since I mainly work in abstraction: Node -> Pod = 1 container and have not come across the principle of work you indicated in practice, so my knowledge of this will not allow an objective discussion on the pros and cons of this solution at the moment time. Most likely everything will be fine. Of the minuses, only an increase in the number of attack vectors due to the use of shared memory is likely. Therefore, for myself, I personally accept this option and with a high probability I will use it in my own solutions, as the most efficient current solution, provided that abstract sockets are used.

Yes, please share any updates about this; tell me if it works well for you!

But the main guest in this PR is still file sockets, which exist almost everywhere and are understood and known to very many people. I would say that this is already a generally accepted standard. Which cannot be completely discarded or abandoned by referring to abstract sockets (which are not always and everywhere possible to use) or the legacy of the project.

When/if we receive a report that can't be adapted to use abstract sockets, this PR will be reconsidered.

But for now, we have a queue of difficult things to implement, and will try to keep it simple for now.

I also know of people who use these roles (nginx, and others) on separate virtual machines under ESXi, which, as far as I remember, do not have a common IPC.

If we abandon file sockets altogether, then in my humble opinion it looks like a degradation of the project.

No, it's not abandoned. It's just delayed until we really need it.

If we leave everything as it is and file sockets have to be deleted manually, after automatic creation at the time of unit startup - then this is also, in my humble opinion, a degradation of the project.

In fact, when we talk about abstract sockets in light of the work in progress for file sockets, we are trying to find a workaround, not a real solution to the problem of closing file sockets.

I am sincerely glad and grateful to you personally and @tippexs for throwing options to find a solution, because this is how we can come to a successful and best solution to this problem.

Regards, Evgenii.

Still, I agree with you that Unix sockets are a big thing for Unit, and in fact we're right now discussing about more uses of unix sockets within Unit (and NGINX too).

Cheers,

Alex

alejandro-colomar avatar Jul 27 '22 13:07 alejandro-colomar

I'm referring to this: https://www.mirantis.com/blog/multi-container-pods-and-container-communication-in-kubernetes/ which you'll notice that mentions as a use case having nginx as a reverse proxy before the actual app (unit in this case).

Wow, Alex (@alejandro-colomar), this is incredibly useful information, sincerely thank you for this! And yes, it seems to me that this will allow the use of a common namespace to work with abstract sockets. I'll try this out in the future, after full abstract socket support is added to the main branch by you (probably already).

I'm happy that this is so useful to you! :-)

But as you very well noted earlier, not all applications can support abstract sockets, and not everyone uses kubernetes as a flexible virtualization tool.

Yeah, I know there are scenarios where file sockets might be necessary.

I overlooked this, since I mainly work in abstraction: Node -> Pod = 1 container and have not come across the principle of work you indicated in practice, so my knowledge of this will not allow an objective discussion on the pros and cons of this solution at the moment time. Most likely everything will be fine. Of the minuses, only an increase in the number of attack vectors due to the use of shared memory is likely. Therefore, for myself, I personally accept this option and with a high probability I will use it in my own solutions, as the most efficient current solution, provided that abstract sockets are used.

Yes, please share any updates about this; tell me if it works well for you!

But the main guest in this PR is still file sockets, which exist almost everywhere and are understood and known to very many people. I would say that this is already a generally accepted standard. Which cannot be completely discarded or abandoned by referring to abstract sockets (which are not always and everywhere possible to use) or the legacy of the project.

When/if we receive a report that can't be adapted to use abstract sockets, this PR will be reconsidered.

But for now, we have a queue of difficult things to implement, and will try to keep it simple for now.

I also know of people who use these roles (nginx, and others) on separate virtual machines under ESXi, which, as far as I remember, do not have a common IPC. If we abandon file sockets altogether, then in my humble opinion it looks like a degradation of the project.

No, it's not abandoned. It's just delayed until we really need it.

If we leave everything as it is and file sockets have to be deleted manually, after automatic creation at the time of unit startup - then this is also, in my humble opinion, a degradation of the project. In fact, when we talk about abstract sockets in light of the work in progress for file sockets, we are trying to find a workaround, not a real solution to the problem of closing file sockets. I am sincerely glad and grateful to you personally and @tippexs for throwing options to find a solution, because this is how we can come to a successful and best solution to this problem. Regards, Evgenii.

Still, I agree with you that Unix sockets are a big thing for Unit, and in fact we're right now discussing about more uses of unix sockets within Unit (and NGINX too).

Cheers,

Alex

Good news!

Thanks for sharing this!

Once again, I want to note that I do not insist on combining this PR, it seems to me that I stand on the side and act in the interests of those users who are used to, love and want to use unix sockets in unit and for whom the lack of the correct ability to use them can be critical and the lack of a proper level of support and implementation on the part of the developer at the moment when they decide to choose unit as a product to implement in their production solutions.

I also know of people who use these roles (nginx, and others) on separate virtual machines under ESXi, which, as far as I remember, do not have a common IPC.

And isn't this an example of use with the inability to use abstract sockets? Or is it necessary for a person to buy a subscription to unit and write to support with a feature request in order for this request to be classified as a real client need and pass according to your internal schedule as necessary for implementation?

Sorry, but the position on your part and that of your colleagues is constantly changing. Sometimes (1) you call it implementation complexity, sometimes (2) lack of real need on the part of customers, sometimes (3) lack of real scenarios where it is required (offering workarounds), sometimes (4) useless project legacy, sometimes (5) internal prioritization features and lack of internal implementation capabilities. Am I forgetting something here? I wonder what else could be invented here to justify not doing this in a few months from the moment I reported it and even suggested a possible solution? This is a rhetorical question.

UPDATE: Based on the logic that the unit command promotes: "give me an example of use". You just gave an example based on abstract sockets, which are not always and everywhere possible to use and far from everywhere they are implemented right now. In addition, taking into account that examples with separate virtual machines that do not have a common IPC and applications that cannot work with abstract sockets are absolutely obvious. But for the unit command, for some reason, this rule and logic does not work.

Personally, I am grateful to you for all kinds of attempts to solve and the solutions you provided for the problems I found, from my point of view, you are the most motivated and effective employee along with @tippexs and @hongzhidao.

Please note that I do not insist on solving this problem! If you do not consider it necessary to solve this problem or do not want to do it now - please, I will accept any decision on this problem. The project is yours, so it's up to you!

echolimazulu avatar Jul 27 '22 14:07 echolimazulu

Good news!

Thanks for sharing this!

Once again, I want to note that I do not insist on combining this PR, it seems to me that I stand on the side and act in the interests of those users who are used to, love and want to use unix sockets in unit and for whom the lack of the correct ability to use them can be critical and the lack of a proper level of support and implementation on the part of the developer at the moment when they decide to choose unit as a product to implement in their production solutions.

I also know of people who use these roles (nginx, and others) on separate virtual machines under ESXi, which, as far as I remember, do not have a common IPC.

And isn't this an example of use with the inability to use abstract sockets?

I'd like to know a bit more about the case. I admit that I have no idea about ESXi (I for example had some idea about k8s, which is why I could help you). Maybe after some investigation, we can conclude that there's an easier way. I don't know.

Or is it necessary for a person to buy a subscription to unit and write to support with a feature request in order for this request to be classified as a real client need and pass according to your internal schedule as necessary for implementation?

Certainly not. I don't even know if you have a subscription to nginx. I care about every user of Unit. And if a feature makes sense to me, balancing the implementation complexity, and its benefits, I'll implement it, independently of who requested/suggested it.

Just saying that the complexity of this one is huge, so I want to make sure that we really need it, before taking the time to add it.

Sorry, but the position on your part and that of your colleagues is constantly changing.

I work for NGINX (F5), but my comments here are my own. I don't speak for my collegues, nor do I share their opinion in some cases. I speak independently, and I hope consistently in time.

As you've noticed in the past, I have a strong tendency to be minimallist in my patches (even if sometimes I'm critizised of trying to apply too big patches). You can see that even after having merged several important features to the Unit code, my net line count in the project is negative (+785 -878). I'm just being myself here, and I'm very averse to add such complex code to a project whose source code I still don't know very well (for some meaning of "very well").

Sometimes (1) you call it implementation complexity, sometimes

Certainly, it is complex. You wrote the code, so I guess you have an idea. It will be even more complex for me to review your code, since it's completely new code for me.

(2) lack of real need on the part of customers,

In your case, we've agreed it's probably simpler for you to use abstract sockets, right?

sometimes (3) lack of real scenarios where it is required (offering workarounds),

I know there are scenarios where it is required, but I would like to have some certainty on that necessity by real users (your collegue that you mentioned might be one of them).

sometimes (4) useless project legacy,

I don't remember having said that.

sometimes (5) internal prioritization features and lack of internal implementation capabilities.

This certainly is the biggest issue we have right now.

Am I forgetting something here? I wonder what else could be invented here to justify not doing this in a few months from the moment I reported it and even suggested a possible solution? This is a rhetorical question.

We're less than a handful of C programmers in the Unit team. I've been in the team just for a few months, and still don't know very well how Unit is implemented internally, so it's hard for me to review this code; please understand that.

We've also had important changes, so that's why at some points we've been more silent than other times.

Personally, I am grateful to you for all kinds of attempts to solve and the solutions you provided for the problems I found, from my point of view, you are the most motivated and effective employee along with @tippexs and @hongzhidao.

Thanks! Really!

Please note that I do not insist on solving this problem! If you do not consider it necessary to solve this problem or do not want to do it now - please, I will accept any decision on this problem. The project is yours, so it's up to you!

My priority right now will be related with unix sockets for the next release, so I'll be learning how Unit uses them.

As I said, for the moment, I'll defer the revision of this PR indefinitely. After some more work with unix sockets, I may reconsider it.

Please tell your collegues to open an issue if they want to share a description of their use case, and maybe you or they can convince me that it's necessary enough that I can't avoid it.

Cheers,

Alex

alejandro-colomar avatar Jul 27 '22 14:07 alejandro-colomar

I'd like to know a bit more about the case. I admit that I have no idea about ESXi (I for example had some idea about k8s, which is why I could help you). Maybe after some investigation, we can conclude that there's an easier way. I don't know.

I am very grateful to you!

I work for NGINX (F5), but my comments here are my own. I don't speak for my collegues, nor do I share their opinion in some cases. I speak independently, and I hope consistently in time.

I don't separate F5 Networks and Nginx Inc. for individual employees, just as I do not separate the unit project from other F5 Networks projects. I chose unit because of the good attitude towards the nginx product and its well-deserved reputation, so I expect the same from unit, but taking into account its small age. I have previously described my vision to your Sr Director of Product Management, pointing out the unit's current competitive advantages and disadvantages, but this seems to have been ignored or left unnoticed.

However, after so much time and opposition from the team (not all of its members), providing evidence and discussions, I am still here, but in the near future I plan to close this PR, since unfortunately I will not be able to work on it anymore due to some circumstances not related to this PR.

I don't remember having said that.

This was not said by you, but by @Vbart in other PRs.

echolimazulu avatar Jul 27 '22 15:07 echolimazulu

capabilities(7) might be a nice way of solving the unlink(2)ing of the socket files problem, on Linux at least. I don't know if they ever made it to other Unix systems. It never got past the POSIX draft stage (POSIX.1e). Or maybe they have a similar mechanism.

The router process could gain the CAP_DAC_OVERRIDE capability, do the unlink(2) then relinquish the capability until next time. I did use capabilities(7) in a project many years ago...

ac000 avatar Jul 28 '22 01:07 ac000

capabilities(7) might be a nice way of solving the unlink(2)ing of the socket files problem, on Linux at least. I don't know if they ever made it to other Unix systems. It never got past the POSIX draft stage (POSIX.1e). Or maybe they have a similar mechanism.

The router process could gain the CAP_DAC_OVERRIDE capability, do the unlink(2) then relinquish the capability until next time. I did use capabilities(7) in a project many years ago...

I'm not sure it's a safe one. See a rationale in #740. CAP_DAC_OVERRIDE is basically root (together with CAP_SYS_ADMIN, one of the most powerful caps). I'd prefer a ssafer one, I think

alejandro-colomar avatar Jul 28 '22 15:07 alejandro-colomar

Heh, I think that's a little extreme. It allows a process to override file system permissions yes, but it's not as bad as CAP_SYS_ADMIN (that's the basically root one) and it would only be for the duration of the unlink(2).

But then maybe in the end all these schemes aren't much simpler than just sending a message to the main process to

unlink /path/to/socket

and that would work for everybody.

But then nxt_main_process_cleanup() gets called when an application terminates, (perhaps not quite what we want),... and it looks like the listen sockets should be tucked away in task->thread->runtime->listen_sockets, but the array holding those sockets always seems to be empty. Even looking in the router process where I'd expect them to be

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f9633111bfe in epoll_wait (epfd=3, events=0x15e5970, maxevents=32, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30        return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) frame 2
#2  0x00000000004134ba in nxt_event_engine_start (engine=0x15dcbd0)
    at src/nxt_event_engine.c:549
549             engine->event.poll(engine, timeout);
(gdb) p *task->thread->runtime->listen_sockets
$1 = {
  elts = 0x15dc918,
  nelts = 0,
  size = 48,
  nalloc = 1,
  mem_pool = 0x15dc060
}
(gdb) p *(nxt_listen_socket_t *)task->thread->runtime->listen_sockets->elts
$2 = {
  socket = 0,
  backlog = 0,
  work_queue = 0x0,
  handler = 0x0,
  sockaddr = 0x0,
  count = 0,
  flags = 0 '\000',
  read_after_accept = 0 '\000',
  ipv6only = 0 '\000',
  socklen = 0 '\000',
  address_length = 0 '\000'
}
(gdb)

ac000 avatar Jul 28 '22 18:07 ac000