AFFiNE icon indicating copy to clipboard operation
AFFiNE copied to clipboard

[Bug]: "Forbidden: Copilot session has been deleted. Identify: COPILOT_SESSION_DELETED"

Open Tedpac opened this issue 2 months ago • 7 comments

What happened?

When sending a message in the chat that opens when you press the space bar in a clean block, the error shown in the title appears:

Image

Distribution version

Web (https://app.affine.pro)

App Version

0.25.4

What browsers are you seeing the problem on if you're using web version?

Chrome

Are you self-hosting?

  • [x] Yes

Self-hosting Version

0.25.4

Relevant log output

[Nest] 1  - 11/15/2025, 5:18:14 AM VERBOSE [Locker] <selfhosted:graphql:c0c1966e-66a3-4069-abdf-69685f42bf8a> Client d0282cea-007f-402a-9c2f-4e3d91a83e3c is trying to lock resource copilot:session:fb204fc7-cfeb-4252-80ab-bc9254d9df28:bc996d7c-fc8f-41ff-92d3-7a462c09666e

Anything else?

No response

Tedpac avatar Nov 15 '25 05:11 Tedpac

Issue Status: 🆕 *Untriaged

*🆕 Untriaged

The team has not yet reviewed the issue. We usually do it within one business day. Docs: https://github.com/toeverything/AFFiNE/blob/canary/docs/issue-triaging.md

This is an automatic reply by the bot.

affine-issue-bot[bot] avatar Nov 15 '25 05:11 affine-issue-bot[bot]

do you using old desktop client?

darkskygit avatar Nov 15 '25 06:11 darkskygit

No, I'm using my self-hosted AFFiNE from Google Chrome.

Tedpac avatar Nov 15 '25 21:11 Tedpac

Hi @Tedpac 👋

This error usually appears when the copilot/chat session fails to acquire a lock in the backend.
Since you're self-hosting, it can happen due to:

  • multiple browser tabs trying to open the same chat session
  • a stale/unfinished copilot session stored in Redis or the database
  • WebSocket connection drops
  • reverse proxy buffering or timeout issues (Nginx/Traefik/Caddy)

To help narrow down the cause, could you share:

  1. The logs from the server container around the time the error occurs
  2. Whether you are using a reverse proxy (Nginx, Caddy, Cloudflare Tunnel, etc.)
  3. Whether Redis is enabled in your self-host setup

You can get the server logs with:

docker logs affine-server --tail=200

Himanshu2459 avatar Nov 20 '25 12:11 Himanshu2459

Hi, @Himanshu2459! 👋

First, thank you so much for your detailed response.

Let me comment on the possible causes you mentioned:

  • I'm completely sure the error isn't due to having multiple tabs open, since I always only have one AFFiNE tab open.
  • One thing I've tried is restarting the entire AFFiNE stack on my server, clearing the browser cache, and retrying the same action that causes the error. Even with this, the error still occurs, so I don't think it's due to a stale/unfinished Copilot session (or at least not one stored in Redis).
  • Yes, I am using a reverse proxy. Still, I'm sure that's not the problem, because I accessed the AFFiNE server directly (on the container's external port) without going through my reverse proxy, and the same error occurs.

Here's what you asked me to share:

  1. These are the logs after restarting the entire AFFiNE stack, clearing my browser cache, and repeating the action three times in a row (by the way, sorry, but 200 lines of logs is a lot):
[Nest] 1  - 11/25/2025, 5:07:54 PM   DEBUG [CopilotProviderFactory] <selfhosted:http:e0dfb21b-bf2e-40f1-8947-fdb65278131f> Resolving copilot provider for model: claude-sonnet-4-5@20250929
[Nest] 1  - 11/25/2025, 5:08:00 PM     LOG [MonitorService] memory usage: rss: 499527680, heapTotal: 159920128, heapUsed: 142041456, external: 16059269, arrayBuffers: 10925466
[Nest] 1  - 11/25/2025, 5:08:00 PM   DEBUG [JobQueue] Job [doc.findEmptySummaryDocs] added; id=findEmptySummaryDocs
[Nest] 1  - 11/25/2025, 5:08:00 PM   DEBUG [JobQueue] Job [doc.recordPendingDocUpdatesCount] added; id=doc:record-pending-updates-count
[Nest] 1  - 11/25/2025, 5:08:02 PM   DEBUG [SpaceSyncGateway] Connection disconnected, total: 0
[Nest] 1  - 11/25/2025, 5:08:02 PM VERBOSE [Locker] <selfhosted:http:fcdc9ab1-12f1-4b34-b289-ef81d261c1a2> Client cluster:WVcdWsR5c4pN91JBjPtqP:NFO8Cfhuj6oXjFHAKqmPA is trying to lock resource doc:update:bc996d7c-fc8f-41ff-92d3-7a462c09666e:bc996d7c-fc8f-41ff-92d3-7a462c09666e
[Nest] 1  - 11/25/2025, 5:08:02 PM   DEBUG [SpaceSyncGateway] New connection, total: 1
[Nest] 1  - 11/25/2025, 5:08:03 PM   DEBUG [EventBus] <selfhosted:ws:e382fca0-97ab-4228-8d6b-2643f1c7c243> Dispatch event: workspace.embedding
[Nest] 1  - 11/25/2025, 5:08:03 PM VERBOSE [EventBus] <selfhosted:ws:e382fca0-97ab-4228-8d6b-2643f1c7c243> Handle event [workspace.embedding] (CopilotEmbeddingJob.addWorkspaceEmbeddingQueue)
[Nest] 1  - 11/25/2025, 5:08:03 PM VERBOSE [Locker] <selfhosted:ws:150d2403-3c72-4485-b050-420f3e4e32d2> Client cluster:WVcdWsR5c4pN91JBjPtqP:OXDXJdsvm_UYJvSx5AwpY is trying to lock resource doc:update:bc996d7c-fc8f-41ff-92d3-7a462c09666e:bc996d7c-fc8f-41ff-92d3-7a462c09666e
[Nest] 1  - 11/25/2025, 5:08:04 PM   DEBUG [CopilotProviderFactory] <selfhosted:http:5fe8999c-fb60-42e3-8253-7d63d984abf2> Resolving copilot provider for model: gemini-2.5-flash
[Nest] 1  - 11/25/2025, 5:08:04 PM   DEBUG [CopilotProviderFactory] <selfhosted:http:5fe8999c-fb60-42e3-8253-7d63d984abf2> Copilot provider candidate found: gemini
[Nest] 1  - 11/25/2025, 5:08:04 PM   DEBUG [CopilotProviderFactory] <selfhosted:http:5fe8999c-fb60-42e3-8253-7d63d984abf2> Resolving copilot provider for model: claude-sonnet-4-5@20250929
[Nest] 1  - 11/25/2025, 5:08:04 PM VERBOSE [Locker] <selfhosted:http:22aa62f1-d2f8-484c-ac2e-f142b140bb24> Client cluster:WVcdWsR5c4pN91JBjPtqP:iTJJibyQk4wAsY6rc-zIM is trying to lock resource doc:update:bc996d7c-fc8f-41ff-92d3-7a462c09666e:bc996d7c-fc8f-41ff-92d3-7a462c09666e
[Nest] 1  - 11/25/2025, 5:08:10 PM VERBOSE [Locker] <selfhosted:graphql:d59abcd4-bd95-4bcf-bd1d-2e9da54cbfe4> Client 19338e07-6219-4982-adbd-c908805e586d is trying to lock resource copilot:session:fb204fc7-cfeb-4252-80ab-bc9254d9df28:bc996d7c-fc8f-41ff-92d3-7a462c09666e
[Nest] 1  - 11/25/2025, 5:08:14 PM VERBOSE [Locker] <selfhosted:graphql:18025ab1-c3a1-4183-aefe-c112ba1a5686> Client 460094ad-77a4-4f77-9df4-978e87e5b3ae is trying to lock resource copilot:session:fb204fc7-cfeb-4252-80ab-bc9254d9df28:bc996d7c-fc8f-41ff-92d3-7a462c09666e
[Nest] 1  - 11/25/2025, 5:08:17 PM     LOG [job] <selfhosted:job:9009275f-d703-481b-92ac-4bb2265b37b8> Job started: [doc.findEmptySummaryDocs] (DocServiceCronJob.findEmptySummaryDocs, id=findEmptySummaryDocs)
[Nest] 1  - 11/25/2025, 5:08:17 PM     LOG [job] <selfhosted:job:9009275f-d703-481b-92ac-4bb2265b37b8> Job finished: [doc.findEmptySummaryDocs] (DocServiceCronJob.findEmptySummaryDocs, id=findEmptySummaryDocs), signal=repeat
[Nest] 1  - 11/25/2025, 5:08:17 PM   DEBUG [job] <selfhosted:event:cc6a688b-9cea-4353-8ecc-3fc981835dbd> Added job [doc.findEmptySummaryDocs] to queue, signal=repeat
[Nest] 1  - 11/25/2025, 5:08:17 PM VERBOSE [Locker] <selfhosted:graphql:b8d3aed0-6723-4092-b908-c1dba80fd9d7> Client 6e3b4290-4e48-44b7-b7c8-b785e048dd1b is trying to lock resource copilot:session:fb204fc7-cfeb-4252-80ab-bc9254d9df28:bc996d7c-fc8f-41ff-92d3-7a462c09666e
  1. Yes, I am using a reverse proxy, but as I mentioned earlier, the test I did indicates that that is not the problem.
  2. Yes, Redis is enabled on my installation.

Tedpac avatar Nov 25 '25 17:11 Tedpac

Hi @Tedpac,

Based on your logs, AFFiNE is repeatedly trying to lock the same Copilot session:

copilot:session:fb204fc7-cfeb-4252-80ab-bc9254d9df28:bc996d7c-fc8f-41ff-92d3-7a462c09666e

This usually happens when a stale Copilot session lock gets stuck in Redis. The lock sometimes never expires, and the server keeps failing to acquire it.

How to fix it:

  1. Check if Redis has any stuck Copilot session keys:
docker exec -it affine-redis redis-cli KEYS "copilot:session:*"
  1. Delete the stale locks. First, list all the keys from step 1, then delete them one by one:
docker exec -it affine-redis redis-cli DEL copilot:session:fb204fc7-cfeb-4252-80ab-bc9254d9df28:bc996d7c-fc8f-41ff-92d3-7a462c09666e
  1. Restart your AFFiNE server:
docker restart affine-server

Why this happens:

In version 0.25.4, Redis sometimes keeps Copilot locks permanently (TTL not set correctly), causing the AI/chat to break even after restarts.

Long-term solution:

Consider updating to AFFiNE 0.26.0 or a recent canary build, which includes fixes for the lock-expiration bug.

Let me know if this resolves the issue.

If the fix works for you, a 👍 or short reply would mean a lot. I’m building my open-source contributions, so your feedback will help me grow.

Himanshu2459 avatar Nov 25 '25 22:11 Himanshu2459

Here's what I tried:

  1. Stopped the entire AFFiNE stack.
  2. Started only the Redis container and ran a FLUSHALL.
  3. Stopped the Redis container.
  4. Started the entire AFFiNE stack.

The problem persists.

Tedpac avatar Nov 28 '25 15:11 Tedpac