mtprotoproxy icon indicating copy to clipboard operation
mtprotoproxy copied to clipboard

User management causing serious delays

Open amintnt opened this issue 4 years ago • 13 comments

Hello.

I've added around 70 users to the config file, and have applied different constraints for each. I keep adding new users and update the config file, and then issue a signal to the mtproto process to reload the file.

Here's an example line in the config file (as you can see, the secrets are actually used as usernames too):

USERS.update({"ed274a32d2bf42ccccc58ad6e5fb3dcb":"ed274a32d2bf42ccccc58ad6e5fb3dcb"}) USER_EXPIRATIONS.update({"ed274a32d2bf42ccccc58ad6e5fb3dcb":"19/8/2020"}) USER_DATA_QUOTA.update({"ed274a32d2bf42ccccc58ad6e5fb3dcb":50000000}) USER_MAX_TCP_CONNS.update({"ed274a32d2bf42ccccc58ad6e5fb3dcb":15})

The problem is that this is causing serious delays for users and their connections. For every connection, multiple checks are required, which is evidently computationally expensive given the delays my users are experiencing. To make sure this is in fact the problem, I temporarily disabled all constraints (by putting a hashtag in front of them in the config file), and the performance went up back to normal.

In addition, the cpu has a single core, and more than half of it is idle at any given time.

I thought maybe I could run multiple mtproto processes to remedy the issue, but it was in vein. Another idea is to not use the secrets as usernames, and alternatively use a string (or integer) with shorter characters. And since the source code is already multithreaded, I don't think threads can be of any help.

What are your suggestions to prevent the checks from causing delays?

amintnt avatar Aug 17 '20 15:08 amintnt

Hello, yes, if there are many users, the delays are possible. This happens because the telegram client doen't send the user id, only the secret. The proxy enumerating all users secrets using cryptography to check the user which is computationally expensive.

To prevent this you can run several proxy servers on different ports with part of users on every proxy server.

alexbers avatar Aug 17 '20 16:08 alexbers

Hello, yes, if there are many users, the delays are possible. This happens because the telegram client doen't send the user id, only the secret. The proxy enumerating all users secrets using cryptography to check the user which is computationally expensive.

To prevent this you can run several proxy servers on different ports with part of users on every proxy server.

Thank you for your prompt reply. I'll give it a try.

amintnt avatar Aug 17 '20 16:08 amintnt

Hello, yes, if there are many users, the delays are possible. This happens because the telegram client doen't send the user id, only the secret. The proxy enumerating all users secrets using cryptography to check the user which is computationally expensive.

To prevent this you can run several proxy servers on different ports with part of users on every proxy server.

I actually managed to fix the problem by running multiple server processes on different ports with the same config file and balancing the load between them using nginx.

Although everything looks fine and the problem is gone, I had a closer look into the source code and here's what I found:

The code actually uses port reuse and, therefore, load can be automatically balanced between multiple instances of the server listening on the same port. In principle, it's no different than my aforementioned setup using nginx and multiple servers with different ports. I remember I tried running multiple instances of the server as a solution, but with no good results.

Is there something I'm missing? Maybe running multiple instances did work but I didn't notice? Or is my understanding of port reuse is flawed?

amirnh2 avatar Mar 08 '21 05:03 amirnh2

I'd say running multiple instances on the same port should not only just wotk, but even be more efficient, because of less overhead from nginx and from OS having to maintain 2x more TCP sockets

On Mon, 8 Mar 2021, 06:42 amirnh2, [email protected] wrote:

Hello, yes, if there are many users, the delays are possible. This happens because the telegram client doen't send the user id, only the secret. The proxy enumerating all users secrets using cryptography to check the user which is computationally expensive.

To prevent this you can run several proxy servers on different ports with part of users on every proxy server.

I actually managed to fix the problem by running multiple server processes on different ports with the same config file and balancing the load between them using nginx.

Although everything looks fine and the problem is gone, I had a closer look into the source code and here's what I found:

The code actually uses port reuse and, therefore, load can be automatically balanced between multiple instances of the server listening on the same port. In principle, it's no different than my aforementioned setup using nginx and multiple servers with different ports. I remember I tried running multiple instances of the server as a solution, but with no good results.

Is there something I'm missing? Maybe running multiple instances did work but I didn't notice? Or is my understanding of port reuse is flawed?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/alexbers/mtprotoproxy/issues/235#issuecomment-792480963, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADHA7TXVIVIQT33OB5RZN3TCRPT3ANCNFSM4QBZEYLA .

seriyps avatar Mar 08 '21 12:03 seriyps

Hello, yes, if there are many users, the delays are possible. This happens because the telegram client doen't send the user id, only the secret. The proxy enumerating all users secrets using cryptography to check the user which is computationally expensive. To prevent this you can run several proxy servers on different ports with part of users on every proxy server.

I actually managed to fix the problem by running multiple server processes on different ports with the same config file and balancing the load between them using nginx.

Although everything looks fine and the problem is gone, I had a closer look into the source code and here's what I found:

The code actually uses port reuse and, therefore, load can be automatically balanced between multiple instances of the server listening on the same port. In principle, it's no different than my aforementioned setup using nginx and multiple servers with different ports. I remember I tried running multiple instances of the server as a solution, but with no good results.

Is there something I'm missing? Maybe running multiple instances did work but I didn't notice? Or is my understanding of port reuse is flawed?

I took another approach to the problem. It'd be great if you guys could tell me what you think of it and whether it works. Actually, it does work, but I haven't had the chance to test it under load.

I'll quickly state the problem: The code uses asyncio, which runs on a single thread, and any blocking code can delay its event loop. In our case, the part that causes the delay (correct me if I'm wrong) is where the prekey+secret needs to be hashed and matched against the entire list of available secret keys.

Solution: I offloaded the entire hashing and matching process to worker processes using loop.run_in_executor. The result is awaited and once the correct user is found, it is returned to the main asyncio loop for it to continue. So, depending on the number of worker processes, multiple matching processes can be ongoing at the same time instead of the main event loop doing it all.

The implementation is very simple and everything seems to be working well in practice, but I also would appreciate it if you could put your two cents in.

If it works, maybe @alexbers could add it to the code (I also can fork the project and add it myself if needed).

amintnt avatar Apr 12 '21 11:04 amintnt

in my latest encounters with @alexbers he was very bussy so may I suggest that you add it yourself? I could help you for testing that on my server. @amintnt

erfantkerfan avatar Apr 12 '21 11:04 erfantkerfan

@amintnt it should work, as long as the number of workers is fixed (so, they are pre-forked and stored in some thread/process pool and are reused). With unbound threads you risk to make your OS being busy managing hundreds of threads instead of doing real work. Also, I'm not sure how are things now, but ~5 years ago when I was actively working with Python, I knew that threads in Python are not always efficient because of GIL -global interpreter lock - it means that no matter how many threads you would start, only one could execute Python code at a time. But it might be not that big of a problem since hashing code executes C functions and those should be not affected by GIL (not sure about that, but in theory...). Another alternative is to spawn subprocesses, not threads.

seriyps avatar Apr 12 '21 14:04 seriyps

@amintnt it should work, as long as the number of workers is fixed (so, they are pre-forked and stored in some thread/process pool and are reused). With unbound threads you risk to make your OS being busy managing hundreds of threads instead of doing real work. Also, I'm not sure how are things now, but ~5 years ago when I was actively working with Python, I knew that threads in Python are not always efficient because of GIL -global interpreter lock - it means that no matter how many threads you would start, only one could execute Python code at a time. But it might be not that big of a problem since hashing code executes C functions and those should be not affected by GIL (not sure about that, but in theory...). Another alternative is to spawn subprocesses, not threads.

Thank you. As I mentioned, worker processes are involved not threads. Threads would have probably made no difference due to GIL as you also pointed out. And yes, a fixed executor pool is initially created and passed to the loop.

Thank you.

in my latest encounters with @alexbers he was very bussy so may I suggest that you add it yourself? I could help you for testing that on my server. @amintnt

I'll test the code for a few days and then provide the modified source code if successful. But in principle, its no different than running multiple instances of the original source code, except for the proxy statistics which is almost divided by the number of running instances.

amintnt avatar Apr 12 '21 15:04 amintnt

@amintnt any update on that one?

fluential avatar Jan 02 '22 09:01 fluential

@amintnt any update on that one?

Sorry I couldn't keep my promise. The pool process actually worked very well, but I decided not to put it here since I've made some other modifications to the code to suit my other needs, and I've lost track of what they were exactly. I will try to fork the code and re-make the modifications and push it back for anyone interested, but can't make any promises on the exact time. If you're familiar with python coding, you should know it's not a very complicated process and you could do it yourself.

And I also should emphasize that, as mentioned by the author, you could simply run the original repository multiple times to have the Linux kernel do the load balancing without the need for any modifications to the code (if you're using it for public proxies, I highly recommend it). The original code actually uses port reuse, allowing multiple processes to listen on the same port, with the kernel responsible for balancing the load between them. The caveat is that the statistics provided by the code will break, for example the number of concurrent users and the tcp_limit_hit feature in the code (which frankly is not a very useful feature).

amintnt avatar Jan 02 '22 09:01 amintnt

Hello, yes, if there are many users, the delays are possible. This happens because the telegram client doen't send the user id, only the secret. The proxy enumerating all users secrets using cryptography to check the user which is computationally expensive. To prevent this you can run several proxy servers on different ports with part of users on every proxy server.

I actually managed to fix the problem by running multiple server processes on different ports with the same config file and balancing the load between them using nginx.

Although everything looks fine and the problem is gone, I had a closer look into the source code and here's what I found:

The code actually uses port reuse and, therefore, load can be automatically balanced between multiple instances of the server listening on the same port. In principle, it's no different than my aforementioned setup using nginx and multiple servers with different ports. I remember I tried running multiple instances of the server as a solution, but with no good results.

Is there something I'm missing? Maybe running multiple instances did work but I didn't notice? Or is my understanding of port reuse is flawed?

could you share your exact nginx config file?

erfantkerfan avatar Jan 02 '22 13:01 erfantkerfan

@amintnt thanks for sharing, good to know that it worked

fluential avatar Jan 02 '22 15:01 fluential

Hello, yes, if there are many users, the delays are possible. This happens because the telegram client doen't send the user id, only the secret. The proxy enumerating all users secrets using cryptography to check the user which is computationally expensive. To prevent this you can run several proxy servers on different ports with part of users on every proxy server.

I actually managed to fix the problem by running multiple server processes on different ports with the same config file and balancing the load between them using nginx. Although everything looks fine and the problem is gone, I had a closer look into the source code and here's what I found: The code actually uses port reuse and, therefore, load can be automatically balanced between multiple instances of the server listening on the same port. In principle, it's no different than my aforementioned setup using nginx and multiple servers with different ports. I remember I tried running multiple instances of the server as a solution, but with no good results. Is there something I'm missing? Maybe running multiple instances did work but I didn't notice? Or is my understanding of port reuse is flawed?

could you share your exact nginx config file?

Here's the block I used in my nginx config (/etc/nginx/nginx.conf):

stream {
    server {
        listen            4410;
        proxy_pass        to_proxy_servers;
        error_log         /dev/null;
    }

    upstream to_proxy_servers {
        least_conn;
        server 127.0.0.1:4411;
        server 127.0.0.1:4412;
        server 127.0.0.1:4413;
        server 127.0.0.1:4414;
   }
}

Here nginx acts as a tcp load balancer and listens on port 4410, forwarding traffic to ports 4411-4414 with least_conn algorithm.

amintnt avatar Jan 02 '22 17:01 amintnt