Exception caused shutdown
Gorse version
using the latest gorse-in-one docker image. ( 0.4.14 )
Describe the bug
I have no clue how it occurred, I sent a request for neighbouring items, the load balancer returned a 503 Service Unavailable, after reviewing the container it was shutdown and the last logs contain the following:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| timestamp | message |
|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1695724922516 | {"level":"info","ts":1695724922.5168717,"msg":"load config","config":"/etc/gorse/config.toml"} |
| 1695724922518 | {"level":"info","ts":1695724922.5180194,"msg":"load cache","path":"/var/lib/gorse/master_cache.data"} |
| 1695724922518 | {"level":"info","ts":1695724922.5180697,"msg":"no local cache found, create a new one","path":"/var/lib/gorse/master_cache.data"} |
| 1695724922615 | {"level":"info","ts":1695724922.6155522,"msg":"start model fit","period":3600} |
| 1695724922615 | {"level":"info","ts":1695724922.6157806,"msg":"start model searcher","period":21600} |
| 1695724922616 | {"level":"info","ts":1695724922.6160376,"msg":"load dataset","positive_feedback_types":["place_order","add_to_cart","save_listing"],"read_feedback_types":["open_product_page","see_product_card"],"item_ttl":0,"feedback_ttl":0} |
| 1695724922616 | {"level":"info","ts":1695724922.6163213,"msg":"start rpc server","host":"0.0.0.0","port":8086} |
| 1695724922621 | {"level":"info","ts":1695724922.6212022,"msg":"start http server","url":"http://0.0.0.0:8088","cors_methods":[],"cors_doamins":[]} |
| 1695724922666 | {"level":"info","ts":1695724922.6667936,"msg":"prepare to fit click model","n_jobs":1} |
| 1695724922666 | {"level":"warn","ts":1695724922.666884,"msg":"empty ranking dataset","positive_feedback_type":["place_order","add_to_cart","save_listing"]} |
| 1695724922666 | {"level":"info","ts":1695724922.666905,"msg":"start searching neighbors of users","n_cache":100} |
| 1695724922670 | panic: runtime error: index out of range [-1] |
| 1695724922670 | goroutine 89 [running]: |
| 1695724922670 | github.com/zhenghaoz/gorse/base/search.(*IVF).Build.func1(0x50?, 0x0) |
| 1695724922670 | /go/gorse/base/search/ivf.go:222 +0x38d |
| 1695724922670 | github.com/zhenghaoz/gorse/base/parallel.Parallel(0x3, 0xc00038e360?, 0xc00002c050) |
| 1695724922670 | /go/gorse/base/parallel/parallel.go:39 +0xf7 |
| 1695724922670 | github.com/zhenghaoz/gorse/base/search.(*IVF).Build(0xc0004108a0) |
| 1695724922670 | /go/gorse/base/search/ivf.go:209 +0x525 |
| 1695724922670 | github.com/zhenghaoz/gorse/base/search.(*IVFBuilder).Build(0xc00002c000, 0x3f4ccccd, 0x3, 0x64?, 0xc000160070) |
| 1695724922670 | /go/gorse/base/search/ivf.go:294 +0x175 |
| 1695724922670 | github.com/zhenghaoz/gorse/master.(*Master).findUserNeighborsIVF(0xc000245c00, 0xc0009a5d40, {0x3095650, 0x0, 0x0}, {0xc000100a80, 0x18, 0x18}, 0xc000100900, 0xc0000a2e40) |
| 1695724922670 | /go/gorse/master/tasks.go:780 +0x478 |
| 1695724922670 | github.com/zhenghaoz/gorse/master.(*FindUserNeighborsTask).run(0xc000010f00, 0x1814fb0?) |
| 1695724922670 | /go/gorse/master/tasks.go:651 +0x606 |
| 1695724922670 | github.com/zhenghaoz/gorse/master.(*Master).RunPrivilegedTasksLoop.func2({0x1e141a0, 0xc000010f00}) |
| 1695724922670 | /go/gorse/master/master.go:330 +0x18c |
| 1695724922670 | created by github.com/zhenghaoz/gorse/master.(*Master).RunPrivilegedTasksLoop |
| 1695724922670 | /go/gorse/master/master.go:326 +0x74e |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
As you can see the error is mainly an index out of range [-1]
Additional context It was working fine for about a week.
Using postgres as datastore, redis as cache store
I made a PR to fix this a few weeks ago, try using this Docker image hash since there is no version cut with this fix:
zhenghaoz/gorse-master@sha256:026d1bd4ad3f861bb45be0d5f141bfe90ce244dfbefaf85a343dc0276a8100b0
That's the hash for gorse-master, you'll have to find one for gorse-in-one if you're using that.
cc @zhenghaoz Would it be a good idea to cut 0.4.15 with this fix?