noobaa-core
noobaa-core copied to clipboard
MongoDB Seg Faults
Environment info
- NooBaa Version:
root@ait-kube-1:~/noobaa/2.1.0# ./noobaa-linux-v2.1.0 version
INFO[0000] CLI version: 2.1.0
INFO[0000] noobaa-image: noobaa/noobaa-core:5.3.0
INFO[0000] operator-image: noobaa/noobaa-operator:2.1.0
Platform: Kubernetes 1.14.1 | minikube 1.1.1 | OpenShift 4.1 | other: specify Kubernetes 1.14.1 with VMWare CSI drivers
Actual behavior
- MongoDB seg faults with "Invalid Access at address 0" and this results in an un-usable system.
Expected behavior
- DB should remain up. Or at a restart try to repair the DB so that the system is usable.
Steps to reproduce
- Trigger to create the problem is not known at this time.
More information - Screenshots / Logs / Other output
Might be related to #5666
@nimrod-becker: Thanks! Looks like there is no workaround at this time - correct?
Hi @gbadanahatti Thanks.
We were trying to catch this case in order to replace the db container image with a debug version of it.
@jackyalbo Can provide the info - I think it was this image jalbo/mongodbg3.6.3:1
from docker hub.
indeed this is the version with debug symbols:
jalbo/mongodbg3.6.3:1
@gbadanahatti if you can try to reproduce with this version it will be a huge help.
Sadly we failed reproducing while working with this version.
@jackyalbo , will try with this image and let you know.
@jackyalbo , I have been running load with this image for the past 18 hours and have not seen it. Is this debug version based on the same version that exists in the build? Looking at the forums, it seems that this issue in mongodb has been fixed in 3.4.6. Will we be able to reproduce this issue with this image?
@gbadanahatti ,yes, this is the image we are using (for upstream) centos/mongodb-36-centos7 the image we gave you is based on the same 3.6 version just with debug-symbols both are later than 3.4.6
https://hub.docker.com/r/centos/mongodb-36-centos7
FWIW I think every version of mongo probably fixed multiple segv issues...
@jackyalbo , last night the mongo db pod restarted. Although, there is nothing obvious in the logs that indicates a failure. So I am not sure why there was a restart and as a result the endpoint also restarted. Does the endpoint self signed certificate change after the restart? This interrupts the client traffic. I have attached the logs of the container that restarted. mongo.log.gz
@jackyalbo : Under load, the the DB has crashed 50 times over the past 24hours with the debug image. This results in end point restart as well.
@gbadanahatti Can you send me/attach the whole logs?
@jackyalbo , these are probably not logs that will help becuase of the number of restarts that has happened. Should we change the restart policy to not restart and catch the logs.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.