passenger icon indicating copy to clipboard operation
passenger copied to clipboard

Crash in Passenger::removeDirTree on logrotate

Open mphilipps opened this issue 5 years ago • 25 comments

Issue report

hi Today we had a crash of PassengerAgent during the daily logrotate for apache2

/var/log/apache2/error.log
{
   rotate 60
   daily
  maxsize 268435456
   missingok
   notifempty
   compress
        sharedscripts
   postrotate
      /bin/systemctl reload apache2 > /dev/null
   endscript
}

Backtrace:

Core was generated by `Passenger core                                                      '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005582e89fd0fa in Passenger::Agent::Fundamentals::dumpBacktrace (state=..., userData=0x0) at src/agent/Shared/Fundamentals/AbortHandler.cpp:595
595     src/agent/Shared/Fundamentals/AbortHandler.cpp: No such file or directory.
[Current thread is 1 (Thread 0x7f7d17fff700 (LWP 3368))]
(gdb) bt
#0  0x00005582e89fd0fa in Passenger::Agent::Fundamentals::dumpBacktrace (state=..., userData=0x0) at src/agent/Shared/Fundamentals/AbortHandler.cpp:595
#1  0x00005582e89fc09d in Passenger::Agent::Fundamentals::runInSubprocessWithTimeLimit (state=..., callback=0x5582e89fcd8b <Passenger::Agent::Fundamentals::dumpBacktrace(Passenger::Agent::Fundamentals::AbortHandlerWorkingState&, void*)>, userData=0x0, timeLimit=4000)
    at src/agent/Shared/Fundamentals/AbortHandler.cpp:294
#2  0x00005582e89fdb54 in Passenger::Agent::Fundamentals::dumpDiagnostics (state=...) at src/agent/Shared/Fundamentals/AbortHandler.cpp:746
#3  0x00005582e89ff073 in Passenger::Agent::Fundamentals::abortHandler (signo=6, info=0x7f7d157f5fb0, _unused=0x7f7d157f5e80) at src/agent/Shared/Fundamentals/AbortHandler.cpp:1085
#4  <signal handler called>
#5  0x00007f7d20b59fff in raise () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007f7d20b5b42a in abort () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x00007f7d214720ad in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007f7d21470066 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007f7d2146f089 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007f7d2146f9dd in __gxx_personality_v0 () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#11 0x00007f7d20ed5f33 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#12 0x00007f7d20ed6437 in _Unwind_Resume () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#13 0x00005582e89a690f in Passenger::removeDirTree (path="/tmp/passenger.spawn.XXXXS2B3C0") at src/cxx_supportlib/FileTools/FileManip.cpp:311
#14 0x00005582e8aba791 in Passenger::SpawningKit::HandshakeWorkDir::~HandshakeWorkDir (this=0x7f7cdc001860, __in_chrg=<optimized out>) at src/agent/Core/SpawningKit/Handshake/WorkDir.h:81
#15 0x00005582e8b2f413 in boost::checked_delete<Passenger::SpawningKit::HandshakeWorkDir> (x=0x7f7cdc001860) at src/cxx_supportlib/vendor-modified/boost/core/checked_delete.hpp:34
#16 0x00005582e8b0ed6f in boost::scoped_ptr<Passenger::SpawningKit::HandshakeWorkDir>::~scoped_ptr (this=0x7f7d157f8560, __in_chrg=<optimized out>) at src/cxx_supportlib/vendor-modified/boost/smart_ptr/scoped_ptr.hpp:88
#17 0x00005582e8abae39 in Passenger::SpawningKit::HandshakeSession::~HandshakeSession (this=0x7f7d157f8550, __in_chrg=<optimized out>) at src/agent/Core/SpawningKit/Handshake/Session.h:91
#18 0x00005582e8ad6544 in Passenger::SpawningKit::SmartSpawner::spawn (this=0x7f7d0c007ca0, options=...) at src/agent/Core/SpawningKit/SmartSpawner.h:1299
#19 0x00005582e8c043bb in Passenger::ApplicationPool2::Group::spawnThreadRealMain (this=0x7f7d0c007240, spawner=..., options=..., restartsInitiated=0) at src/agent/Core/ApplicationPool/Group/SpawningAndRestarting.cpp:107
#20 0x00005582e8c03d71 in Passenger::ApplicationPool2::Group::spawnThreadMain (this=0x7f7d0c007240, self=..., spawner=..., options=..., restartsInitiated=0) at src/agent/Core/ApplicationPool/Group/SpawningAndRestarting.cpp:53
#21 0x00005582e8c51986 in boost::_mfi::mf4<void, Passenger::ApplicationPool2::Group, boost::shared_ptr<Passenger::ApplicationPool2::Group>, boost::shared_ptr<Passenger::SpawningKit::Spawner>, Passenger::ApplicationPool2::Options, unsigned int>::operator() (this=0x7f7d0c031fa0, p=0x7f7d0c007240, a1=..., a2=...,
    a3=..., a4=0) at src/cxx_supportlib/vendor-modified/boost/bind/mem_fn_template.hpp:506
#22 0x00005582e8c4de90 in boost::_bi::list5<boost::_bi::value<Passenger::ApplicationPool2::Group*>, boost::_bi::value<boost::shared_ptr<Passenger::ApplicationPool2::Group> >, boost::_bi::value<boost::shared_ptr<Passenger::SpawningKit::Spawner> >, boost::_bi::value<Passenger::ApplicationPool2::Options>, boost::_bi::value<unsigned int> >::operator()<boost::_mfi::mf4<void, Passenger::ApplicationPool2::Group, boost::shared_ptr<Passenger::ApplicationPool2::Group>, boost::shared_ptr<Passenger::SpawningKit::Spawner>, Passenger::ApplicationPool2::Options, unsigned int>, boost::_bi::list0> (this=0x7f7d0c031fb0, f=..., a=...)
    at src/cxx_supportlib/vendor-modified/boost/bind/bind.hpp:531
#23 0x00005582e8c4990d in boost::_bi::bind_t<void, boost::_mfi::mf4<void, Passenger::ApplicationPool2::Group, boost::shared_ptr<Passenger::ApplicationPool2::Group>, boost::shared_ptr<Passenger::SpawningKit::Spawner>, Passenger::ApplicationPool2::Options, unsigned int>, boost::_bi::list5<boost::_bi::value<Passenger::ApplicationPool2::Group*>, boost::_bi::value<boost::shared_ptr<Passenger::ApplicationPool2::Group> >, boost::_bi::value<boost::shared_ptr<Passenger::SpawningKit::Spawner> >, boost::_bi::value<Passenger::ApplicationPool2::Options>, boost::_bi::value<unsigned int> > >::operator() (this=0x7f7d0c031fa0)
    at src/cxx_supportlib/vendor-modified/boost/bind/bind.hpp:1294
#24 0x00005582e8c44693 in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf4<void, Passenger::ApplicationPool2::Group, boost::shared_ptr<Passenger::ApplicationPool2::Group>, boost::shared_ptr<Passenger::SpawningKit::Spawner>, Passenger::ApplicationPool2::Options, unsigned
int>, boost::_bi::list5<boost::_bi::value<Passenger::ApplicationPool2::Group*>, boost::_bi::value<boost::shared_ptr<Passenger::ApplicationPool2::Group> >, boost::_bi::value<boost::shared_ptr<Passenger::SpawningKit::Spawner> >, boost::_bi::value<Passenger::ApplicationPool2::Options>, boost::_bi::value<unsigned int>
> >, void>::invoke (function_obj_ptr=...) at src/cxx_supportlib/vendor-modified/boost/function/function_template.hpp:158
#25 0x00005582e89b11c0 in boost::function0<void>::operator() (this=0x7f7d0c0015a0) at src/cxx_supportlib/vendor-modified/boost/function/function_template.hpp:764
#26 0x00005582e8aa57fb in oxt::dynamic_thread_group::thread_main (this=0x7f7d0c0072b0, data=0x7f7d0c0015a0) at src/cxx_supportlib/oxt/dynamic_thread_group.hpp:131
#27 0x00005582e8b9cbe8 in boost::_mfi::mf1<void, oxt::dynamic_thread_group, oxt::dynamic_thread_group::thread_func_data*>::operator() (this=0x7f7cdc0008e0, p=0x7f7d0c0072b0, a1=0x7f7d0c0015a0) at src/cxx_supportlib/vendor-modified/boost/bind/mem_fn_template.hpp:165
#28 0x00005582e8b8dde3 in boost::_bi::list2<boost::_bi::value<oxt::dynamic_thread_group*>, boost::_bi::value<oxt::dynamic_thread_group::thread_func_data*> >::operator()<boost::_mfi::mf1<void, oxt::dynamic_thread_group, oxt::dynamic_thread_group::thread_func_data*>, boost::_bi::list0> (this=0x7f7cdc0008f0, f=...,
    a=...) at src/cxx_supportlib/vendor-modified/boost/bind/bind.hpp:319
#29 0x00005582e8b7cbdb in boost::_bi::bind_t<void, boost::_mfi::mf1<void, oxt::dynamic_thread_group, oxt::dynamic_thread_group::thread_func_data*>, boost::_bi::list2<boost::_bi::value<oxt::dynamic_thread_group*>, boost::_bi::value<oxt::dynamic_thread_group::thread_func_data*> > >::operator() (this=0x7f7cdc0008e0)
    at src/cxx_supportlib/vendor-modified/boost/bind/bind.hpp:1294
#30 0x00005582e8b64e42 in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf1<void, oxt::dynamic_thread_group, oxt::dynamic_thread_group::thread_func_data*>, boost::_bi::list2<boost::_bi::value<oxt::dynamic_thread_group*>, boost::_bi::value<oxt::dynamic_thread_group::thread_func_data*> > >, void>::invoke (function_obj_ptr=...) at src/cxx_supportlib/vendor-modified/boost/function/function_template.hpp:158
#31 0x00005582e89b11c0 in boost::function0<void>::operator() (this=0x7f7d157f9e40) at src/cxx_supportlib/vendor-modified/boost/function/function_template.hpp:764
#32 0x00005582e8cd2330 in oxt::thread::thread_main(boost::function<void ()>, boost::shared_ptr<oxt::thread_local_context>) (func=..., ctx=...) at src/cxx_supportlib/oxt/implementation.cpp:416
#33 0x00005582e89921c1 in boost::_bi::list2<boost::_bi::value<boost::function<void ()> >, boost::_bi::value<boost::shared_ptr<oxt::thread_local_context> > >::operator()<void (*)(boost::function<void ()>, boost::shared_ptr<oxt::thread_local_context>), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(boost::function<void ()>, boost::shared_ptr<oxt::thread_local_context>), boost::_bi::list0&, int) (a=<synthetic pointer>..., f=<optimized out>, this=<optimized out>) at src/cxx_supportlib/vendor-modified/boost/bind/bind.hpp:319
#34 boost::_bi::bind_t<void, void (*)(boost::function<void ()>, boost::shared_ptr<oxt::thread_local_context>), boost::_bi::list2<boost::_bi::value<boost::function<void ()> >, boost::_bi::value<boost::shared_ptr<oxt::thread_local_context> > > >::operator()() (this=<optimized out>)
    at src/cxx_supportlib/vendor-modified/boost/bind/bind.hpp:1294
#35 boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(boost::function<void ()>, boost::shared_ptr<oxt::thread_local_context>), boost::_bi::list2<boost::_bi::value<boost::function<void ()> >, boost::_bi::value<boost::shared_ptr<oxt::thread_local_context> > > > >::run() (this=<optimized out>)
    at src/cxx_supportlib/vendor-modified/boost/thread/detail/thread.hpp:117
#36 0x00005582e8cc67da in boost::(anonymous namespace)::thread_proxy (param=0x7f7d0c031b70) at src/cxx_supportlib/vendor-modified/boost/libs/thread/src/pthread/thread.cpp:177
#37 0x00007f7d224ba4a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#38 0x00007f7d20c0fd0f in clone () from /lib/x86_64-linux-gnu/libc.so.6

Question 2: Passenger version and integration mode:

  • passenger 1:6.0.4-1~stretch1 and libapache2-mod-passenger 1:6.0.4-1~stretch1

Question 3: OS or Linux distro, platform (including version):

  • Debian 9.11

Question 4: Passenger installation method: [x] Phusion APT repo

Question 5: Your app's programming language (including any version managers) and framework (including versions):

  • ruby 2.3 running redmine.

Question 6: Are you using a PaaS and/or containerization? If so which one?

  • no

mphilipps avatar Nov 22 '19 08:11 mphilipps

I think the logrotate might only be relevant in that it causes apache to relaod the config and therefore restarts passenger.

it looks like passenger is trying to clean up its spawning directory, but it's already gone. Is your /tmp dir regularly cleaned out?

CamJN avatar Dec 23 '19 16:12 CamJN

We have no tmpreaper or such running on /tmp, but the debian default apache2.service is using PrivateTmp=true, could that have something to do with it?

mphilipps avatar Jan 06 '20 08:01 mphilipps

Yes, that could be related. Does a directory matching the pattern: /tmp/passenger.spawn.******** get created at all when you start apache? If not, does such a directory get created in apache's private tmp dir?

CamJN avatar Jan 13 '20 15:01 CamJN

No, even after restarting apache /tmp appears to be empty. nsenter -t 23186 -m ls /tmp/ 23186 is the pid of the passenger core process.

However we do set PassengerInstanceRegistryDir to /run/passenger and /run/passenger contains passenger.Bd2nqTe. Is this the correct directory?

mphilipps avatar Jan 13 '20 16:01 mphilipps

No that's a different dir, the dir you're looking for has the .spawn. bit in the middle.

CamJN avatar Jan 13 '20 16:01 CamJN

$ systemctl restart apache2
$ nsenter -t 12338 -m find / -name '*.spawn.*'

and nothing was found.

mphilipps avatar Jan 14 '20 08:01 mphilipps

Can you check if systemd is starting apache with a private tmp dir?

CamJN avatar Feb 04 '20 01:02 CamJN

Is there any doubt?

$ ls -l /proc/{1,8735}/ns/mnt
lrwxrwxrwx 1 root root 0 Feb 10 09:11 /proc/1/ns/mnt -> mnt:[4026531840]
lrwxrwxrwx 1 root root 0 Feb 10 09:11 /proc/8735/ns/mnt -> mnt:[4026532229]

I would take that as yes it is starting apache with a private tmp dir.

mphilipps avatar Feb 10 '20 08:02 mphilipps

I'm also interested in this. We've had apache/passenger running in production with our current setup for years. Just today we had a decent outage on production because I think of this same error.

dmesg shows a segmentation fault at exactly the time we do our logrotate (unfortunately I couldn't find the core file, even though looking in /etc/apache2 where it said it might be found.

Interestingly, it was only the following day when logrotate happened again that the real trouble started and we ended up getting a bunch of these errors:

[ E 2020-03-04 06:25:21.0225 32376/T0 apa/Hooks.cpp:751 ]: Unexpected error in mod_passenger: Cannot connect to the Passenger core at unix:/var/passenger/passenger.0Oh8G1l/agents.s/core
  Backtrace:
     in 'Passenger::FileDescriptor Passenger::Apache2Module::Hooks::connectToCore()' (Hooks.cpp:343)
     in 'int Passenger::Apache2Module::Hooks::handleRequest(request_rec*)' (Hooks.cpp:622)

Anyway, maybe this is unrelated, but, maybe it's of help. We've had two of these segfaults, the first one on Feb 20th.

I've also been unable to find the spawn directories, we also set our own PASSENGER_INSTANCE_REGISTRY_DIR.

patrickdavey avatar Mar 04 '20 20:03 patrickdavey

would a possible fix/workaround (specifically for the logrotate issue) be instead of issuing a reload to issue a restart ?

default logrotate for apache

   postrotate
                if invoke-rc.d apache2 status > /dev/null 2>&1; then \
                    invoke-rc.d apache2 reload > /dev/null 2>&1; \
                fi;

change to

   postrotate
                if invoke-rc.d apache2 status > /dev/null 2>&1; then \
                    invoke-rc.d apache2 restart > /dev/null 2>&1; \
                fi;

patrickdavey avatar Mar 04 '20 21:03 patrickdavey

Afaik even on an apache2 reload the passenger process gets restarted. So it unlikely to help.

mphilipps avatar Mar 05 '20 07:03 mphilipps

Ok can someone test the fix here: https://github.com/phusion/passenger/tree/Fix_privatetmp_crash It seems to work for me, but i've had difficulty reproducing the original crash reliably so it might just be luck.

CamJN avatar Mar 05 '21 18:03 CamJN

Anybody?

CamJN avatar Mar 15 '21 16:03 CamJN

Sorry @CamJN, this happens quite irregularly for us (and only on prod) and I don't want to test the fix in production.

While this is absolutely not a fix for the underlying issue, I changed our /etc/logrotate.d/apache2 to use the copytruncate strategy instead of reloading apache itself (which was what triggered this issue). There is apparently some downsides to this, it's possible that you'll lose a few log entries, but for us it's far more important that we don't have this issue meaning we're serving 500s for a period of time.

You can read more about the copytruncate strategy here

patrickdavey avatar Apr 08 '21 21:04 patrickdavey

[ E 2020-03-04 06:25:21.0225 32376/T0 apa/Hooks.cpp:751 ]: Unexpected error in mod_passenger: Cannot connect to the Passenger core at unix:/var/passenger/passenger.0Oh8G1l/agents.s/core
  Backtrace:
     in 'Passenger::FileDescriptor Passenger::Apache2Module::Hooks::connectToCore()' (Hooks.cpp:343)
     in 'int Passenger::Apache2Module::Hooks::handleRequest(request_rec*)' (Hooks.cpp:622)

Very much the same problem here

claasz avatar Jun 10 '21 08:06 claasz

@claasz are you using 6.0.8 or later? Are you using systemd's PRIVATE_TMP?

CamJN avatar Jun 12 '21 04:06 CamJN

@claasz are you using 6.0.8 or later? Are you using systemd's PRIVATE_TMP?

@CamJN We are using 6.0.7. I have no idea what systemd's PRIVATE_TMP is. How would I check it?

claasz avatar Jun 14 '21 07:06 claasz

@claasz This might help you determine if you are using systemd private tmp: https://access.redhat.com/discussions/3027351

CamJN avatar Jun 14 '21 15:06 CamJN

@claasz This might help you determine if you are using systemd private tmp: https://access.redhat.com/discussions/3027351

@CamJN I see. So yes, we are using systemd private tmp.

claasz avatar Jun 14 '21 15:06 claasz

Ok, well I added a possible fix in 6.0.8, if you try that or 6.0.9 let me know if it works for you.

CamJN avatar Jun 14 '21 15:06 CamJN

@CamJN Thanks. However, we've already applied @patrickdavey 's copytruncate workaround. It's a production system which showed the problem quite infrequently, so it's not possible for us to test the fix.

claasz avatar Jun 14 '21 16:06 claasz

I have the same problem for over a very long time and are not able to solve it. Anyone with a solution for this?

royjoling avatar Aug 27 '22 13:08 royjoling

@patrickdavey Do you have an example of your copytruncate workaround? Or do you already have another fix in the meantime?

royjoling avatar Aug 27 '22 13:08 royjoling

This is the contents of our /etc/logrotate.d/apache2 file

/var/log/apache2/*.log {
        daily
        missingok
        rotate 20
        compress
        delaycompress
        notifempty
        create 640 root adm
        copytruncate
        sharedscripts
}

Yours may be slightly different, but, it's the copytruncate line you need @royjoling. And no, we didn't find a different fix, once the copytruncate worked that was enough (and it was impossible to reproduce anyway, and I'm afraid we don't have the resources to play around with this stuff in production).

patrickdavey avatar Aug 28 '22 07:08 patrickdavey

I finally found a ticket about this issue that has been plaguing me for half a year 😭. This happens exactly 24 hours after the server was booted up, and it didn't always happen.

Been trying a few things, including changing the PassengerInstanceRegistryDir which didn't fix the issue. Also updated to 6.0.14 and it still happens.

I'll be trying the copytruncate fix next.

Update: the fix doesn't work.

joelngwt avatar Sep 20 '22 07:09 joelngwt