noobaa-core MongoDB Seg Faults

Environment info

NooBaa Version: root@ait-kube-1:~/noobaa/2.1.0# ./noobaa-linux-v2.1.0 version INFO[0000] CLI version: 2.1.0
INFO[0000] noobaa-image: noobaa/noobaa-core:5.3.0
INFO[0000] operator-image: noobaa/noobaa-operator:2.1.0

Platform: Kubernetes 1.14.1 | minikube 1.1.1 | OpenShift 4.1 | other: specify Kubernetes 1.14.1 with VMWare CSI drivers

Actual behavior

MongoDB seg faults with "Invalid Access at address 0" and this results in an un-usable system.

Expected behavior

DB should remain up. Or at a restart try to repair the DB so that the system is usable.

Steps to reproduce

Trigger to create the problem is not known at this time.

More information - Screenshots / Logs / Other output

Screen Shot 2020-03-31 at 9 37 39 AM

Mar 31 '20 15:03 gbadanahatti

Might be related to #5666

Mar 31 '20 15:03 nimrod-becker

@nimrod-becker: Thanks! Looks like there is no workaround at this time - correct?

Mar 31 '20 16:03 gbadanahatti

Hi @gbadanahatti Thanks. We were trying to catch this case in order to replace the db container image with a debug version of it. @jackyalbo Can provide the info - I think it was this image jalbo/mongodbg3.6.3:1 from docker hub.

Apr 01 '20 00:04 guymguym

indeed this is the version with debug symbols: jalbo/mongodbg3.6.3:1 @gbadanahatti if you can try to reproduce with this version it will be a huge help. Sadly we failed reproducing while working with this version.

Apr 01 '20 17:04 jackyalbo

@jackyalbo , will try with this image and let you know.

Apr 01 '20 18:04 gbadanahatti

@jackyalbo , I have been running load with this image for the past 18 hours and have not seen it. Is this debug version based on the same version that exists in the build? Looking at the forums, it seems that this issue in mongodb has been fixed in 3.4.6. Will we be able to reproduce this issue with this image?

Apr 02 '20 13:04 gbadanahatti

@gbadanahatti ,yes, this is the image we are using (for upstream) centos/mongodb-36-centos7 the image we gave you is based on the same 3.6 version just with debug-symbols both are later than 3.4.6

https://hub.docker.com/r/centos/mongodb-36-centos7

Apr 02 '20 14:04 jackyalbo

FWIW I think every version of mongo probably fixed multiple segv issues...

Apr 02 '20 16:04 guymguym

@jackyalbo , last night the mongo db pod restarted. Although, there is nothing obvious in the logs that indicates a failure. So I am not sure why there was a restart and as a result the endpoint also restarted. Does the endpoint self signed certificate change after the restart? This interrupts the client traffic. I have attached the logs of the container that restarted. mongo.log.gz

Apr 03 '20 15:04 gbadanahatti

@jackyalbo : Under load, the the DB has crashed 50 times over the past 24hours with the debug image. This results in end point restart as well. Screen Shot 2020-04-07 at 11 49 40 AM

Apr 07 '20 16:04 gbadanahatti

@gbadanahatti Can you send me/attach the whole logs?

Apr 07 '20 16:04 jackyalbo

@jackyalbo , these are probably not logs that will help becuase of the number of restarts that has happened. Should we change the restart policy to not restart and catch the logs.

mongo.tar.gz

Apr 07 '20 17:04 gbadanahatti

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Apr 25 '23 07:04 stale[bot]

noobaa-core noobaa-core copied to clipboard

MongoDB Seg Faults

Environment info

Actual behavior

Expected behavior

Steps to reproduce

More information - Screenshots / Logs / Other output

noobaa-core
noobaa-core copied to clipboard