noobaa-core icon indicating copy to clipboard operation
noobaa-core copied to clipboard

MongoDB Seg Faults

Open gbadanahatti opened this issue 4 years ago • 12 comments

Environment info

  • NooBaa Version: root@ait-kube-1:~/noobaa/2.1.0# ./noobaa-linux-v2.1.0 version INFO[0000] CLI version: 2.1.0
    INFO[0000] noobaa-image: noobaa/noobaa-core:5.3.0
    INFO[0000] operator-image: noobaa/noobaa-operator:2.1.0

Platform: Kubernetes 1.14.1 | minikube 1.1.1 | OpenShift 4.1 | other: specify Kubernetes 1.14.1 with VMWare CSI drivers

Actual behavior

  1. MongoDB seg faults with "Invalid Access at address 0" and this results in an un-usable system.

Expected behavior

  1. DB should remain up. Or at a restart try to repair the DB so that the system is usable.

Steps to reproduce

  1. Trigger to create the problem is not known at this time.

More information - Screenshots / Logs / Other output

Screen Shot 2020-03-31 at 9 37 39 AM

gbadanahatti avatar Mar 31 '20 15:03 gbadanahatti

Might be related to #5666

nimrod-becker avatar Mar 31 '20 15:03 nimrod-becker

@nimrod-becker: Thanks! Looks like there is no workaround at this time - correct?

gbadanahatti avatar Mar 31 '20 16:03 gbadanahatti

Hi @gbadanahatti Thanks. We were trying to catch this case in order to replace the db container image with a debug version of it. @jackyalbo Can provide the info - I think it was this image jalbo/mongodbg3.6.3:1 from docker hub.

guymguym avatar Apr 01 '20 00:04 guymguym

indeed this is the version with debug symbols: jalbo/mongodbg3.6.3:1 @gbadanahatti if you can try to reproduce with this version it will be a huge help. Sadly we failed reproducing while working with this version.

jackyalbo avatar Apr 01 '20 17:04 jackyalbo

@jackyalbo , will try with this image and let you know.

gbadanahatti avatar Apr 01 '20 18:04 gbadanahatti

@jackyalbo , I have been running load with this image for the past 18 hours and have not seen it. Is this debug version based on the same version that exists in the build? Looking at the forums, it seems that this issue in mongodb has been fixed in 3.4.6. Will we be able to reproduce this issue with this image?

gbadanahatti avatar Apr 02 '20 13:04 gbadanahatti

@gbadanahatti ,yes, this is the image we are using (for upstream) centos/mongodb-36-centos7 the image we gave you is based on the same 3.6 version just with debug-symbols both are later than 3.4.6

https://hub.docker.com/r/centos/mongodb-36-centos7

jackyalbo avatar Apr 02 '20 14:04 jackyalbo

FWIW I think every version of mongo probably fixed multiple segv issues...

guymguym avatar Apr 02 '20 16:04 guymguym

@jackyalbo , last night the mongo db pod restarted. Although, there is nothing obvious in the logs that indicates a failure. So I am not sure why there was a restart and as a result the endpoint also restarted. Does the endpoint self signed certificate change after the restart? This interrupts the client traffic. I have attached the logs of the container that restarted. mongo.log.gz

gbadanahatti avatar Apr 03 '20 15:04 gbadanahatti

@jackyalbo : Under load, the the DB has crashed 50 times over the past 24hours with the debug image. This results in end point restart as well. Screen Shot 2020-04-07 at 11 49 40 AM

gbadanahatti avatar Apr 07 '20 16:04 gbadanahatti

@gbadanahatti Can you send me/attach the whole logs?

jackyalbo avatar Apr 07 '20 16:04 jackyalbo

@jackyalbo , these are probably not logs that will help becuase of the number of restarts that has happened. Should we change the restart policy to not restart and catch the logs.

mongo.tar.gz

gbadanahatti avatar Apr 07 '20 17:04 gbadanahatti

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 25 '23 07:04 stale[bot]