manticoresearch icon indicating copy to clipboard operation
manticoresearch copied to clipboard

Manticore server crashes with signal 11

Open elbek opened this issue 8 months ago • 7 comments

Bug Description:

Hi Manticore folks, We have a manticore pod in k8s with about 2.5M records. We are using 6.3.8 version, We have been getting server crashed and k8s restart weekly. Setup is not a cluster, single pod that has generous resource, usually utilize only about 5-10% cpu and less than 10% ram.

cat /etc/manticoresearch/manticore.conf
searchd {
    listen = 127.0.0.1:9312
    listen = 127.0.0.1:9306:mysql
    listen = 127.0.0.1:9308:http
    log = /var/log/manticore/searchd.log
    query_log = /var/log/manticore/query.log
    pid_file = /var/run/manticore/searchd.pid
    data_dir = /var/lib/manticore
}

rt: table qmdocs: ramchunk saved ok (mode=periodic, last TID=6288347, current TID=6311108, ram=50.427 Mb, time delta=36000 sec, took=0.036 sec)
[BUDDY] Fatal error: Uncaught Manticoresearch\Buddy\Core\Error\ManticoreSearchClientError: Error while async request: 104: Connection reset by peer in /usr/share/manticore/modules/manticore-buddy/vendor/manticoresoftware/buddy-core/src/ManticoreSearch/Client.php:175
[BUDDY] Stack trace:
[BUDDY] #0 /usr/share/manticore/modules/manticore-buddy/vendor/manticoresoftware/buddy-core/src/ManticoreSearch/Client.php(142): Manticoresearch\Buddy\Core\ManticoreSearch\Client->runAsyncRequest('sql?mode=raw', 'query=SHOW+VARI...', Array)
[BUDDY] #1 /usr/share/manticore/modules/manticore-buddy/src/Lib/Metric.php(551): Manticoresearch\Buddy\Core\ManticoreSearch\Client->sendRequest('query=SHOW+VARI...')
[BUDDY] #2 /usr/share/manticore/modules/manticore-buddy/src/Lib/Metric.php(488): Manticoresearch\Buddy\Base\Lib\Metric->sendManticoreRequest('SHOW VARIABLES')
[BUDDY] #3 /usr/share/manticore/modules/manticore-buddy/src/Lib/Metric.php(250): Manticoresearch\Buddy\Base\Lib\Metric->getVariableLabels()
[BUDDY] #4 /usr/share/manticore/modules/manticore-buddy/src/Lib/Metric.php(203): Manticoresearch\Buddy\Base\Lib\Metric->snapshot()
[BUDDY] #5 /usr/share/manticore/modules/manticore-buddy/src/Lib/MetricThread.php(103): Manticoresearch\Buddy\Base\Lib\Metric->checkAndSnapshot(300)
[BUDDY] #6 [internal function]: Manticoresearch\Buddy\Base\Lib\MetricThread::Manticoresearch\Buddy\Base\Lib\{closure}(Object(Swoole\Process))
[BUDDY] #7 {main}
[BUDDY]   thrown in /usr/share/manticore/modules/manticore-buddy/vendor/manticoresoftware/buddy-core/src/ManticoreSearch/Client.php on line 175
[BUDDY] [2025-04-07 03:18:07 $142.0]	WARNING	Server::check_worker_exit_status(): worker(pid=228, id=4) abnormal exit, status=255, signal=0
rt: table qmdocs: ramchunk saved ok (mode=periodic, last TID=6311108, current TID=6326095, ram=53.151 Mb, time delta=36000 sec, took=0.036 sec)
[BUDDY] Fatal error: Uncaught Manticoresearch\Buddy\Core\Error\ManticoreSearchClientError: Error while async request: 104: Connection reset by peer in /usr/share/manticore/modules/manticore-buddy/vendor/manticoresoftware/buddy-core/src/ManticoreSearch/Client.php:175
[BUDDY] Stack trace:
[BUDDY] #0 /usr/share/manticore/modules/manticore-buddy/vendor/manticoresoftware/buddy-core/src/ManticoreSearch/Client.php(142): Manticoresearch\Buddy\Core\ManticoreSearch\Client->runAsyncRequest('sql?mode=raw', 'query=SHOW+VARI...', Array)
[BUDDY] #1 /usr/share/manticore/modules/manticore-buddy/src/Lib/Metric.php(551): Manticoresearch\Buddy\Core\ManticoreSearch\Client->sendRequest('query=SHOW+VARI...')
[BUDDY] #2 /usr/share/manticore/modules/manticore-buddy/src/Lib/Metric.php(488): Manticoresearch\Buddy\Base\Lib\Metric->sendManticoreRequest('SHOW VARIABLES')
[BUDDY] #3 /usr/share/manticore/modules/manticore-buddy/src/Lib/Metric.php(250): Manticoresearch\Buddy\Base\Lib\Metric->getVariableLabels()
[BUDDY] #4 /usr/share/manticore/modules/manticore-buddy/src/Lib/Metric.php(203): Manticoresearch\Buddy\Base\Lib\Metric->snapshot()
[BUDDY] #5 /usr/share/manticore/modules/manticore-buddy/src/Lib/MetricThread.php(103): Manticoresearch\Buddy\Base\Lib\Metric->checkAndSnapshot(300)
[BUDDY] #6 [internal function]: Manticoresearch\Buddy\Base\Lib\MetricThread::Manticoresearch\Buddy\Base\Lib\{closure}(Object(Swoole\Process))
[BUDDY] #7 {main}
[BUDDY]   thrown in /usr/share/manticore/modules/manticore-buddy/vendor/manticoresoftware/buddy-core/src/ManticoreSearch/Client.php on line 175
[BUDDY] [2025-04-07 12:23:17 $142.0]	WARNING	Server::check_worker_exit_status(): worker(pid=235, id=4) abnormal exit, status=255, signal=0
Crash!!! Handling signal 11
  1900K .......... .......... .......... .......... ........  100%  308M=0.1s[ec2-user@ip-172-31-15-174 ~]$ 

I see in the log:

Connection reset by peer in /usr/share/manticore/modules/manticore-buddy/vendor/manticoresoftware/buddy-core/src/ManticoreSearch/Client.php:175

We get this err too while using the search engine thru http client:

Post "http://qm-search-service.default.svc.cluster.local:9308/delete": read tcp 192.168.108.11:58500->10.100.87.79:9308: read: connection reset by peer

We get it once in a while with 6.3, when I upgraded to 7.xxx due to sign abrt issue in 6.3 above this connection reset was all over so I was forced to downgrade, but sigabort issue is really giving us hard time to diagnose and fix it.

Attached the server log: manticore-log.txt

Manticore Search Version:

6.3.8

Operating System Version:

Linux amz2

Have you tried the latest development version?

No

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

  • [ ] Implementation completed
  • [ ] Tests developed
  • [ ] Documentation updated
  • [ ] Documentation reviewed
  • [ ] Changelog updated

elbek avatar Apr 07 '25 13:04 elbek

it is hard to figure out the root of the crash just by this line

Crash!!! Handling signal 11

it could be better to reproduce issue without the kuber or allow daemon save its crash log prior to restart container

tomatolog avatar Apr 07 '25 14:04 tomatolog

this is the crash I got, std out log.

elbek avatar Apr 07 '25 15:04 elbek

could you start daemon as described at the manual https://manual.manticoresearch.com/dev/Reporting_bugs#How-to-enable-saving-coredumps-on-crash?

with the --coredump CLI ?

tomatolog avatar Apr 07 '25 15:04 tomatolog

I am not sure if I can do it, this is running in a k8s env and is started as a pod. Maybe I can pass a cli when I start the manticore from its docker image? What would the cli look like?

elbek avatar Apr 08 '25 17:04 elbek

you could set env var in your pod as manual shows https://manual.manticoresearch.com/dev/Reporting_bugs#What-do-I-do-when-Manticore-Search-hangs?

[root@srv lib]# systemctl set-environment _ADDITIONAL_SEARCHD_PARAMS='--coredump'
[root@srv lib]# systemctl restart manticore

tomatolog avatar Apr 08 '25 18:04 tomatolog

dev team also said there should be container left after the crash there you could get in and provide searchd.log from the /var/log/manticore path

tomatolog avatar Apr 10 '25 09:04 tomatolog