lichess-bot icon indicating copy to clipboard operation
lichess-bot copied to clipboard

bot appears offline and doesn't accept challenges

Open vondele opened this issue 3 years ago • 17 comments

After having been online and accepting changes for a while, my bot (https://lichess.org/@/simpleEval) doesn't accept challenges and appears offline (empty dot, instead of green dot next to the name). (The webpage mentions the bot was active 1h ago, about 3h after the last game). This has been happening always. The verbose log doesn't indicate anything that strikes me as special:

2020-09-25 18:18:07,163: <PopenProcess at 0xb5aeab50 (pid=19875)> >> info depth 18 seldepth 19 multipv 1 score cp 599 nodes 233216 nps 656946 tbhits 0 time 355 pv b7b5 a2a3 d8b6 g1f2 b6b7 f1g1 e7g8 g1h1 h8h1 g2h1 b7c8 d2f1 e8f7 f1g3 g4f3 h1f3 a5a4 f3h5 f7e7
2020-09-25 18:18:07,443: <PopenProcess at 0xb5aeab50 (pid=19875)> >> info depth 19 seldepth 23 multipv 1 score cp 599 nodes 419584 nps 660762 tbhits 0 time 635 pv b7b5 a2a3 d8b6 g1f2 b6b7 f1g1 e7g8 g1h1 h8h1 g2h1 b7c8 d2f1 e8f7 f1g3 a5a4 d1d3 f7g7 d4e5 d6e5 h1g2 a4b3
2020-09-25 18:18:07,889: <PopenProcess at 0xb5aeab50 (pid=19875)> >> info depth 20 seldepth 22 multipv 1 score cp 599 nodes 717037 nps 663923 hashfull 20 tbhits 0 time 1080 pv b7b5 a2a3 d8b6 g1f2 b6b7 f1h1 h8h1 d1h1 e7g6 h1h6 b7g7 h6h1 c6c5 d4d5 a5a4 b3b4 g4f3 d2f3 g7f7 b4c5 d6c5
2020-09-25 18:18:08,866: Ignoring illegal move h5g4 on board rn1qk2r/1p2n3/2pp1p2/p3p3/4P1b1/1P1PPN2/PBPN2B1/3R1RK1 w kq - 0 18
2020-09-25 18:18:08,871: --- https://lichess.org/pU512AqJ/black Game over
2020-09-25 18:18:08,872: <PopenProcess at 0xb5aeab50 (pid=19875)> << stop
2020-09-25 18:18:08,873: <PopenProcess at 0xb5aeab50 (pid=19875)> >> info depth 21 seldepth 22 multipv 1 score cp 599 nodes 1370551 nps 664026 hashfull 41 tbhits 0 time 2064 pv b7b5 a2a3 d8b6 g1f2 b6b7 f1h1 h8h1 d1h1 e7g6 h1h6 b7g7 h6h1 c6c5 d4d5 a5a4 b3b4 g4f3 d2f3 g7f7 b4c5 d6c5
2020-09-25 18:18:08,875: <PopenProcess at 0xb5aeab50 (pid=19875)> >> bestmove b7b5 ponder a2a3
2020-09-25 18:18:08,876: <PopenProcess at 0xb5aeab50 (pid=19875)> << quit
2020-09-25 18:18:08,915: +++ Process Free. Total Queued: 0. Total Used: 0

systemd seems to suggest all is up and running

$ sudo systemctl status lichessbot
● lichessbot.service - lichess-bot
   Loaded: loaded (/etc/systemd/system/lichessbot.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2020-09-25 15:59:08 CEST; 6h ago
 Main PID: 18564 (bash)
    Tasks: 17 (limit: 4915)
   CGroup: /system.slice/lichessbot.service
           ├─18564 /bin/bash /home/pi/chess/lichess-bot/start_bot.sh
           ├─18567 python3 lichess-bot.py -v -l out
           ├─18572 python3 lichess-bot.py -v -l out
           ├─18580 python3 lichess-bot.py -v -l out
           ├─18583 python3 lichess-bot.py -v -l out
           ├─18584 python3 lichess-bot.py -v -l out
           ├─18586 python3 lichess-bot.py -v -l out
           └─18588 python3 lichess-bot.py -v -l out

Sep 25 18:18:01 raspberrypi bash[18564]:     nodes: 336566
Sep 25 18:18:01 raspberrypi bash[18564]:     score: {1: Score(cp=301, mate=None)}
Sep 25 18:18:04 raspberrypi bash[18564]:     depth: 19
Sep 25 18:18:04 raspberrypi bash[18564]:     nps: 665028
Sep 25 18:18:04 raspberrypi bash[18564]:     nodes: 505422
Sep 25 18:18:04 raspberrypi bash[18564]:     score: {1: Score(cp=600, mate=None)}
Sep 25 18:18:06 raspberrypi bash[18564]:     depth: 21
Sep 25 18:18:06 raspberrypi bash[18564]:     nps: 637030
Sep 25 18:18:06 raspberrypi bash[18564]:     nodes: 1481096
Sep 25 18:18:06 raspberrypi bash[18564]:     score: {1: Score(cp=600, mate=None)}

and I can ping lichess.org.

Any ideas what I might have done wrong. The code is at the current master with minimal changes to config.yml.default:

---
> token: "xxxxxxxxxxxxxxxx"    # lichess OAuth2 Token
6c6
<   name: "stockfish.simpleEval"        # binary name of the engine to use
---
>   name: "engine_name"        # binary name of the engine to use
28c28
<     Threads: 1               # max CPU threads the engine can use
---
>     Threads: 2               # max CPU threads the engine can use
44c44
< abort_time: 60               # time to abort a game in seconds when there is no activity
---
> abort_time: 20               # time to abort a game in seconds when there is no activity
46c46
< move_overhead: 200           # increase if your bot flags games too often
---
> move_overhead: 2000          # increase if your bot flags games too often
49c49
<   concurrency: 3             # number of games to play simultaneously
---
>   concurrency: 1             # number of games to play simultaneously
51c51
<   accept_bot: true           # accepts challenges coming from other bots
---
>   accept_bot: false          # accepts challenges coming from other bots
66d65
<     - ultraBullet
70c69
<     - classical
---
> #   - classical

vondele avatar Sep 25 '20 20:09 vondele

last things in the logs now:

2020-09-26 14:14:47,591: <PopenProcess at 0xb49512f0 (pid=26287)> >> bestmove d4f3
2020-09-26 14:14:47,619: https://lichess.org:443 "POST /api/bot/game/nL8SXN2r/move/d4f3 HTTP/1.1" 200 11
2020-09-26 14:14:47,627: --- https://lichess.org/nL8SXN2r/black Game over
2020-09-26 14:14:47,629: <PopenProcess at 0xb49512f0 (pid=26287)> << quit
2020-09-26 14:14:47,670: +++ Process Free. Total Queued: 0. Total Used: 0
2020-09-26 14:56:45,348:     Accept Blitz rated challenge from BOT oof231(2534)
2020-09-26 14:56:45,352: Resetting dropped connection: lichess.org
2020-09-26 14:56:50,495: https://lichess.org:443 "POST /api/challenge/Y0OJT6Sk/accept HTTP/1.1" 404 None
2020-09-26 14:56:50,497: Giving up api_post(...) after 1 tries (requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://lichess.org/api/challenge/Y0OJT6Sk/accept)
2020-09-26 14:56:50,497:     Skip missing Blitz rated challenge from BOT oof231(2534)

and the bot appears offline.

vondele avatar Sep 26 '20 17:09 vondele

I have similar issues, sometimes the bot just stops responding to anything - without any error in the console. A restart helps, but it would be good to add some internal check.

Tearth avatar Dec 03 '20 08:12 Tearth

yes, restart helps. I have a cron job in place that restarts the bot if there has been no ouput for 30min or so.

vondele avatar Dec 03 '20 11:12 vondele

Good idea, I will probably do something similar.

Tearth avatar Dec 03 '20 11:12 Tearth

This problem occurs mostly when lichess is restarting itself, and apparently, the client loses connection with API without any warning + hangs on. Were there any tries to resolve this issue in the latest commits?

Tearth avatar Mar 18 '21 13:03 Tearth

Perhaps some features added lately might help resolve this (if anyone is still watching this old issue).

Before the matchmaking feature was added, which lets bots issue challenges instead of waiting for them, lichess-bot passively listened for messages from lichess.org. Maybe, once lichess resets itself, it simply stops sending messages to old connections. Now, with matchmaking sending messages to lichess, this should keep the connection active, or cause an error in communication that would cause lichess-bot to establish a new connection.

Can the reporters here test the latest version of lichess-bot with matchmaking enabled?

MarkZH avatar Aug 01 '22 11:08 MarkZH

Hi, thanks for the update. When I had this issue over a year ago, I ended up with writing my own little program which was handling both matchmaking and resetting lichess-bot (every few hours). I have a plan to update my oldish lichess-bot instance somewhere in the future, but not sure when it will actually happen - I will let you know eventually.

Tearth avatar Aug 01 '22 17:08 Tearth

I worked around this issue by checking online status from the api, and if it's false and doesn't throw http exception restarts the service.

theoden8 avatar Aug 01 '22 20:08 theoden8

With the merge of PR #525, this should be fixed. We would appreciate it if the latest version could be tested by those who have had this problem. Feel free to reopen this issue if problems persist.

MarkZH avatar Aug 07 '22 03:08 MarkZH

...
02:29:54 lichessbot[218390]: [08/13/22 02:29:54] INFO     +++ Process Freed. Total Queued: lichess-bot.py:158
02:29:54 lichessbot[218390]:                              0. Total Used: 0
04:02:14 lichessbot[218390]: [08/13/22 04:02:14] INFO     Will reset connection with       lichess-bot.py:276
04:02:14 lichessbot[218390]:                              lichess
05:02:15 lichessbot[218390]: [08/13/22 05:02:15] INFO     Will reset connection with       lichess-bot.py:276
05:02:15 lichessbot[218390]:                              lichess
06:02:15 lichessbot[218390]: [08/13/22 06:02:15] INFO     Will reset connection with       lichess-bot.py:276
06:02:15 lichessbot[218390]:                              lichess
07:02:15 lichessbot[218390]: [08/13/22 07:02:15] INFO     Will reset connection with       lichess-bot.py:276
07:02:15 lichessbot[218390]:                              lichess
08:02:16 lichessbot[218390]: [08/13/22 08:02:16] INFO     Will reset connection with       lichess-bot.py:276
08:02:16 lichessbot[218390]:                              lichess
09:02:16 lichessbot[218390]: [08/13/22 09:02:16] INFO     Will reset connection with       lichess-bot.py:276
09:02:16 lichessbot[218390]:                              lichess
10:02:17 lichessbot[218390]: [08/13/22 10:02:17] INFO     Will reset connection with       lichess-bot.py:276
10:02:17 lichessbot[218390]:                              lichess
11:02:17 lichessbot[218390]: [08/13/22 11:02:17] INFO     Will reset connection with       lichess-bot.py:276
11:02:17 lichessbot[218390]:                              lichess
12:02:18 lichessbot[218390]: [08/13/22 12:02:18] INFO     Will reset connection with       lichess-bot.py:276
12:02:18 lichessbot[218390]:                              lichess
13:02:18 lichessbot[218390]: [08/13/22 13:02:18] INFO     Will reset connection with       lichess-bot.py:276
13:02:18 lichessbot[218390]:                              lichess
14:02:19 lichessbot[218390]: [08/13/22 14:02:19] INFO     Will reset connection with       lichess-bot.py:276
14:02:19 lichessbot[218390]:                              lichess
15:02:19 lichessbot[218390]: [08/13/22 15:02:19] INFO     Will reset connection with       lichess-bot.py:276
15:02:19 lichessbot[218390]:                              lichess
16:02:20 lichessbot[218390]: [08/13/22 16:02:20] INFO     Will reset connection with       lichess-bot.py:276
16:02:20 lichessbot[218390]:                              lichess

@MarkZH don't see issue reopening button, but seems that li.reset_connection is bust. DNS and pinging appear to be working fine, local time is in sync with ubuntu servers.

theoden8 avatar Aug 13 '22 13:08 theoden8

Can you run lichess-bot with verbose mode (python lichess-bot.py -v) to get more info? While making #525, I made a version that restarts lichess-bot (https://github.com/AttackingOrDefending/lichess-bot/tree/restart). You can try it out, although I don't know if it fixes the problem.

AttackingOrDefending avatar Aug 13 '22 18:08 AttackingOrDefending

@theoden8 When lichess was restarting about 25 minutes ago, I opened my bot to test if it works. The check for if the bot was online /api/users/status?ids=attackingordefending caused a Resetting dropped connection: lichess.org which reconnected the bot to lichess and was able to receive challenges and play games. MarkZH was correct in saying that matchmaking would cause the bot to reconnect, but #525 made this run for every bot even if matchmaking is disabled. I don't know why this doesn't work for your bot but hopefully verbose mode will help. Can you run it with verbose mode?

AttackingOrDefending avatar Aug 15 '22 06:08 AttackingOrDefending

I am, but it takes a few days usually for the bot to appear offline. My matchmaking is turned off.

theoden8 avatar Aug 15 '22 12:08 theoden8

Verbose output:

...
lichessbot: [08/20/22 18:59:15] DEBUG    Event: {'type':                  lichess-bot.py:176
lichessbot:                              'correspondence_ping'}
lichessbot: [08/20/22 19:09:15] DEBUG    Event: {'type':                  lichess-bot.py:176
lichessbot:                              'correspondence_ping'}
lichessbot: [08/20/22 19:19:15] DEBUG    Event: {'type':                  lichess-bot.py:176
lichessbot:                              'correspondence_ping'}
lichessbot: [08/20/22 19:29:15] DEBUG    Event: {'type':                  lichess-bot.py:176
lichessbot:                              'correspondence_ping'}
lichessbot: [08/20/22 19:39:15] DEBUG    Event: {'type':                  lichess-bot.py:176
lichessbot:                              'correspondence_ping'}
lichessbot: [08/20/22 19:49:15] DEBUG    Event: {'type':                  lichess-bot.py:176
lichessbot:                              'correspondence_ping'}
lichessbot:                     DEBUG    Starting new HTTPS           connectionpool.py:1001
lichessbot:                              connection (1):
lichessbot:                              lichess.org:443
lichessbot:                     DEBUG    https://lichess.org:443 "GET  connectionpool.py:456
lichessbot:                              /api/users/status?ids=theoden
lichessbot:                              8_uci HTTP/1.1" 200 None
lichessbot: [08/20/22 19:49:16] INFO     Will reset connection with       lichess-bot.py:276
lichessbot:                              lichess
lichessbot: [08/20/22 19:59:15] DEBUG    Event: {'type':                  lichess-bot.py:176
lichessbot:                              'correspondence_ping'}
lichessbot: [08/20/22 20:09:15] DEBUG    Event: {'type':                  lichess-bot.py:176
lichessbot:                              'correspondence_ping'}
lichessbot: [08/20/22 20:19:15] DEBUG    Event: {'type':                  lichess-bot.py:176
lichessbot:                              'correspondence_ping'}
lichessbot: [08/20/22 20:29:15] DEBUG    Event: {'type':                  lichess-bot.py:176
lichessbot:                              'correspondence_ping'}
lichessbot: [08/20/22 20:39:16] DEBUG    Event: {'type':                  lichess-bot.py:176
lichessbot:                              'correspondence_ping'}
lichessbot: [08/20/22 20:49:16] DEBUG    Event: {'type':                  lichess-bot.py:176
lichessbot:                              'correspondence_ping'}
lichessbot:                     DEBUG    Starting new HTTPS           connectionpool.py:1001
lichessbot:                              connection (1):
lichessbot:                              lichess.org:443
lichessbot:                     DEBUG    https://lichess.org:443 "GET  connectionpool.py:456
lichessbot:                              /api/users/status?ids=theoden
lichessbot:                              8_uci HTTP/1.1" 200 None
lichessbot:                     INFO     Will reset connection with       lichess-bot.py:276
lichessbot:                              lichess
lichessbot: [08/20/22 20:59:16] DEBUG    Event: {'type':                  lichess-bot.py:176
lichessbot:                              'correspondence_ping'}
...

It's on repeat, with very little variation.

theoden8 avatar Aug 21 '22 13:08 theoden8

I'm experiencing something similar (not sure if exactly the same), but the bot stops accepting challenges.

From about line 4300 onwards challenges are no longer accepted.

Logs: https://gist.github.com/ianagbip1oti/fade2db9c5e296f1748c49641716192e

ianagbip1oti avatar Aug 21 '22 20:08 ianagbip1oti

@theoden8 Can you test the version that restarts lichess-bot with verbose mode?

AttackingOrDefending avatar Sep 02 '22 08:09 AttackingOrDefending

@AttackingOrDefending switched to your branch for now Note that lichess-bot disconnected exactly once in the past two weeks so will take some time.

theoden8 avatar Sep 04 '22 23:09 theoden8

@theoden8 Any updates on whether it is working or not?

AttackingOrDefending avatar Oct 15 '22 08:10 AttackingOrDefending

It hasn't failed yet, I do wonder if there's a way to tell if it's an improvement or not.

theoden8 avatar Oct 16 '22 12:10 theoden8

If you see Will restart lichess-bot followed by

    .   _/|
    .  // o\
    .  || ._)  lichess-bot 1.2.0
    .  //__\
    .  )___(   Play on Lichess with a bot

in the logs, then your bot restarted successfully. Otherwise, your bot just didn't disconnect.

AttackingOrDefending avatar Oct 16 '22 13:10 AttackingOrDefending

I was able to find the following in the logs:

...
Oct 01 04:26:24 lichessbot[1236937]:                     DEBUG    Resetting dropped connection: connectionpool.py:273
Oct 01 04:26:24 lichessbot[1236937]:                              lichess.org
Oct 01 04:26:24 lichessbot[1236937]:                     DEBUG    https://lichess.org:443 "GET  connectionpool.py:456
Oct 01 04:26:24 lichessbot[1236937]:                              /api/users/status?ids=theoden
Oct 01 04:26:24 lichessbot[1236937]:                              8_uci HTTP/1.1" 200 None
Oct 01 04:26:24 lichessbot[1236937]:                     INFO     Will restart lichess-bot         lichess-bot.py:315
Oct 01 04:26:24 lichessbot[1236937]:                     INFO     Terminated                       lichess-bot.py:322
Oct 01 04:26:35 lichessbot[1236937]: [10/01/22 04:26:35] INFO                                     lichess-bot.py:1104
Oct 01 04:26:35 lichessbot[1236937]:                                  .   _/|
Oct 01 04:26:35 lichessbot[1236937]:                                  .  // o\
Oct 01 04:26:35 lichessbot[1236937]:                                  .  || ._)  lichess-bot
Oct 01 04:26:35 lichessbot[1236937]:                              1.2.0
Oct 01 04:26:35 lichessbot[1236937]:                                  .  //__\
Oct 01 04:26:35 lichessbot[1236937]:                                  .  )___(   Play on Lichess
Oct 01 04:26:35 lichessbot[1236937]:                              with a bot
Oct 01 04:26:35 lichessbot[1236937]:
Oct 01 04:26:35 lichessbot[1236937]:                     DEBUG    Starting new HTTPS           connectionpool.py:1001
Oct 01 04:26:35 lichessbot[1236937]:                              connection (1):
Oct 01 04:26:35 lichessbot[1236937]:                              lichess.org:443
Oct 01 04:26:35 lichessbot[1236937]:                     DEBUG    https://lichess.org:443 "GET  connectionpool.py:456
Oct 01 04:26:35 lichessbot[1236937]:                              /api/account HTTP/1.1" 200
Oct 01 04:26:35 lichessbot[1236937]:                              None
Oct 01 04:26:35 lichessbot[1236937]:                     INFO     Welcome theoden8_uci!           lichess-bot.py:1111
Oct 01 04:26:35 lichessbot[1236937]:                     INFO     You're now connected to          lichess-bot.py:125
Oct 01 04:26:35 lichessbot[1236937]:                              https://lichess.org/ and
Oct 01 04:26:35 lichessbot[1236937]:                              awaiting challenges.
Oct 01 04:26:35 lichessbot[3647389]:                     DEBUG    Starting new HTTPS           connectionpool.py:1001
Oct 01 04:26:35 lichessbot[3647389]:                              connection (1):
Oct 01 04:26:35 lichessbot[3647389]:                              lichess.org:443
Oct 01 04:26:35 lichessbot[1236937]:                     DEBUG    https://lichess.org:443 "GET  connectionpool.py:456
Oct 01 04:26:35 lichessbot[1236937]:                              /api/account/playing
Oct 01 04:26:35 lichessbot[1236937]:                              HTTP/1.1" 200 17
Oct 01 04:26:35 lichessbot[3647389]:                     DEBUG    https://lichess.org:443 "GET  connectionpool.py:456
Oct 01 04:26:35 lichessbot[3647389]:                              /api/stream/event HTTP/1.1"
Oct 01 04:26:35 lichessbot[3647389]:                              200 None
...

Which suggests that connection was successfully re-established. Should I provide any additional information?

theoden8 avatar Oct 16 '22 16:10 theoden8

If that was the only occurrence, then I don't think that we need anything else.

AttackingOrDefending avatar Oct 16 '22 16:10 AttackingOrDefending

As far as I can read the logs, it appears so.

theoden8 avatar Oct 16 '22 22:10 theoden8

I assume that this behavior occurs when Lichess deploys changes... but I don't know how this client can automatically reconnect.

ddugovic avatar Oct 19 '22 01:10 ddugovic

The idea of this change is that if the bot is offline according to lichess, we restart lichess-bot. I am not 100% sure it works, but it is really hard to test a change if the problem happens once every 2 weeks.

AttackingOrDefending avatar Oct 19 '22 19:10 AttackingOrDefending

The bot did restart itself successfully, as per the most recent logs. It used to happen much more often before, every 2-3 days or so. This is what I had previously: https://gist.github.com/theoden8/2ff9dbe3957b105cedf421d5d960cd34

theoden8 avatar Oct 19 '22 19:10 theoden8

Awesome! I'm predicting a lila-ws restart sometime tomorrow, but I fully trust that your code works and would like to install it on my machine too...

ddugovic avatar Oct 19 '22 23:10 ddugovic

@AttackingOrDefending I changed the way I filter logs, and discovered a drop on 20th, 24th, 27th, 1st, 8th, and 15th. All restarted after a dropped connection similar to log above.

theoden8 avatar Oct 20 '22 17:10 theoden8

That's good. So the change is working.

AttackingOrDefending avatar Oct 20 '22 17:10 AttackingOrDefending

The changes from @AttackingOrDefending (PR #584) have been merged, so I'm closing this issue as successful.

MarkZH avatar Oct 21 '22 01:10 MarkZH