Compose hangs at [uWSGI] getting INI configuration from /opt/mailman-web/uwsgi.ini
Attempting to bring up containerized mailman/mailman-web following the instructions at https://asynchronous.in/docker-mailman/
Everything seems to progress more or less normally, then everything stops/hangs after the message:
[uWSGI] getting INI configuration from /opt/mailman-web/uwsgi.ini
No matter how long I let it sit, it never gets past that.
The next step is the container waiting for the database to be up and running, is your database container and up and running fine?
Yes, they are. I can see that the database and tables have been initialized and loaded. Here are some snippets from the container log.
`Nov 02 08:25:05 RHEL01.(mydomain) (mycontainer)[479382]: MySQL is up - continuing Nov 02 08:25:05 RHEL01.(mydomain) (mycontainer)[479382]: Creating log file for mailman web Nov 02 08:25:05 RHEL01.(mydomain) (mycontainer)[479382]: Creating log file for uwsgi.. Nov 02 08:25:05 RHEL01.(mydomain) (mycontainer)[479382]: settings_local.py not found, it is highly recommended that you provide one Nov 02 08:25:05 RHEL01.(mydomain) (mycontainer)[479382]: Using default configuration to run. Nov 02 08:25:06 RHEL01.(mydomain) (mycontainer)[479382]: Compiling locale files in /usr/lib/python3.8/site-packages Nov 02 08:25:07 RHEL01.(mydomain) (mycontainer)[479382]: processing file django.po in /usr/lib/python3.8/site-packages/django/contrib/flatpages/locale/udm/LC_MESSAGES Nov 02 08:25:07 RHEL01.(mydomain) (mycontainer)[479382]: processing file django.po in /usr/lib/python3.8/site-packages/django/contrib/flatpages/locale/pt_BR/LC_MESSAGES Nov 02 08:25:07 RHEL01.(mydomain) (mycontainer)[479382]: processing file django.po in /usr/lib/python3.8/site-packages/django/contrib/flatpages/locale/cy/LC_MESSAGES Nov 02 08:25:07 RHEL01.(mydomain) (mycontainer)[479382]: processing file django.po in /usr/lib/python3.8/site-packages/django/contrib/flatpages/locale/zh_Hans/LC_MESSAGES . . . . Nov 02 08:25:12 RHEL01.(mydomain) (mycontainer)[479382]: /opt/mailman-web Nov 02 08:25:14 RHEL01.(mydomain) (mycontainer)[479382]: Compressing... done Nov 02 08:25:14 RHEL01.(mydomain) (mycontainer)[479382]: Compressed 2 block(s) from 126 template(s) for 0 context(s). Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Operations to perform: Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Apply all migrations: account, admin, auth, contenttypes, django_mailman3, django_q, hyperkitty, openid, postorius, sessions, sites, socialaccount Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Running migrations: Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying contenttypes.0001_initial... OK Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying auth.0001_initial... OK Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying account.0001_initial... OK Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying account.0002_email_max_length... OK Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying admin.0001_initial... OK Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying admin.0002_logentry_remove_auto_add... OK Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying admin.0003_logentry_add_action_flag_choices... OK Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying contenttypes.0002_remove_content_type_name... OK Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying auth.0002_alter_permission_name_max_length... OK Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying auth.0003_alter_user_email_max_length... OK Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying auth.0004_alter_user_username_opts... OK Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying auth.0005_alter_user_last_login_null... OK Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying auth.0006_require_contenttypes_0002... OK Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying auth.0007_alter_validators_add_error_messages... OK . . .
Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying postorius.0011_auto_20191109_1219... OK Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying postorius.0012_auto_20200420_2136... OK Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying postorius.0013_auto_20201116_0058... OK Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying sessions.0001_initial... OK Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying sites.0002_alter_domain_unique... OK Nov 02 08:25:55 RHEL01.(mydomain) (mycontainer)[479382]: Creating admin user admin ... Nov 02 08:25:56 RHEL01.(mydomain) (mycontainer)[479382]: Superuser created successfully. Nov 02 08:25:56 RHEL01.(mydomain) (mycontainer)[479382]: Setting mmweb.(mydomain) as the default domain ... Nov 02 08:25:57 RHEL01.(mydomain) (mycontainer)[479382]: [uWSGI] getting INI configuration from /opt/mailman-web/uwsgi.ini `
...and that's where everything hangs. There are no more messages after that.
Perusing the logs after I persisted them, I found a message in uwsgi-error.log:
MySQLdb._exceptions.OperationalError: (2013, 'Lost connection to MySQL server during query')
I don't understand how that could be because the other components of mailmanweb that ran ahead of this were clearly able to access and update the database?
No ideas I guess?
I don't have any great ideas why something like this would happen. Can you try it out with some other versions of mysql to see if this is an isolated issue?
I was running 0.4 on Fedora 40. I upgraded to Fedora 41 and, not realizing it, the CPU was being pegged at nearly 100% for a while by the process uwsgi. I went searching for a reason, but I could not find one.
If I do a docker compose up, I get this as the last line of output:
mailman-web | [uWSGI] getting INI configuration from /opt/mailman-web/uwsgi.ini
Mailman does appear to be working, but I wonder if the process is getting stuck at this point and thrashing trying to do something, while other processes are doing what they should do? I'm using MariaDB, but I did not see any errors in the logs about database access timeouts or failures. And, it's definitely able to access the database.
I tried moving to 0.5.2 hoping that might resolve it, but I get the same result.
I'm not sure if this is related. @BearGFR, were you also seeing high CPU usage from uwsgi?
EDIT: I should note that my database is outside of docker natively on the host.
EDIT2: I ran "strace" to see what system calls the stuck process was running. It spewed these lines faster than I think the display could render:
close(269438681) = -1 EBADF (Bad file descriptor)
I am not sure what it's doing, but definitely one of those uwsgi processes is doing something naughty.
I attached to the offending process and I could see the call stack. Best I can tell, it might be this loop that is closing sockets: https://github.com/unbit/uwsgi/blob/master/core/utils.c#L2542. If that uwsgi.max_fd value is set to some crazy large number, that might explain what I see. (I've not investigated further to see, yet.)
EDIT 3: Indeed, that line of code is the issue. I'm not sure where "max_fd" is set, but I build the code from the current source repository and printed it out the logging function in the code. The value produced was 1073741816. So, it's in that loop trying to close a ton of file descriptors. Next... to find where that is being set.
EDIT 4: It appears to be set in uwsgi_start(). There is also a log corresponding to it:
uwsgi-error.log:detected max file descriptor number: 1073741816
Specifically, this line of code: https://github.com/unbit/uwsgi/blob/master/core/uwsgi.c#L2776
I added this to the docker-compose file for the web service:
ulimits:
nofile:
soft: 1024
hard: 2048
This seems to have tamed it. I know it's still trying to close maybe 1024 or 2048 sockets, but it doesn't seem to do it constantly. I did not investigate further, but this at least keeps the CPU from pegging at 100%.
Hey Paul, I can't say if it was soaking the cpu or not. I finally just gave up trying to run the container and instead installed everything "the old way" directly on the system. I've got that working. On Jan 2, 2025, at 12:01, "Paul E. Jones" @.@.>> wrote:
I was running 0.4 on Fedora 40. I upgraded to Fedora 41 and, not realizing it, the CPU was being pegged at nearly 100% for a while by the process uwsgi. I went searching for a reason, but I could not find one.
If I do a docker compose up, I get this as the last line of output:
mailman-web | [uWSGI] getting INI configuration from /opt/mailman-web/uwsgi.ini
Mailman does appear to be working, but I wonder if the process is getting stuck at this point and thrashing trying to do something, while other processes are doing what they should do? I'm using MariaDB, but I did not see any errors in the logs about database access timeouts or failures. And, it's definitely able to access the database.
I tried moving to 0.5.2 hoping that might resolve it, but I get the same result.
I'm not sure if this is related. @BearGFRhttps://github.com/BearGFR, were you also seeing high CPU usage from uwsgi?
— Reply to this email directly, view it on GitHubhttps://github.com/maxking/docker-mailman/issues/730#issuecomment-2568159004, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGXLC3N4O5NWLNITRRIEVE32IV5ERAVCNFSM6AAAAABRA53WJSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRYGE2TSMBQGQ. You are receiving this because you were mentioned.Message ID: @.***>
ACK. Sounds like the issue I uncovered and yours may not be the same. It still might have been, as high CPU usage can cause weird things to happen. But, I felt it was useful to document what I found since I figured others were quite likely to encounter the same thing. In my case, there's nothing wrong with the Docker config kindly provided by @maxking. Rather, it has something to do with the combination of Docker on Fedora 41 + that uwbgsi process is handling sockets. I did open an issue at that the uwbgsi repository, as I think they ought not attempt to close 1,073,741,816 sockets -- even if the OS tells them they can :)
Hi @paulej! I just faced the exact same problem after upgrading from Debian 12 to Debian 13, mailman-web container was at 100% constantly (culprit seemed to be one of the 4 uwsgi process).
Your workaround (that I don't understand to be honest) did the trick. Thanks a lot for sharing!
This effectively sets the ulimit on processes in the container. Without it, it's set to unlimited on some newer systems.
It's really not a mailman or Maxking's issue to deal with, but I commented about it here since it nonetheless would affect others using this docker container.