fix: handle internal notifications during session cleanup

Addresses an issue where internal notifications (e.g. 'cancelled') during session cleanup would trigger validation errors and crash the server.

Changes

Add session state tracking via _closed flag
Skip validation for internal notifications during cleanup phase
Improve progress context cleanup with proper async context manager
Add final_progress notification to ensure completion state is reported

Testing

Successfully tested with multiple concurrent requests with progress tracking:

Cleanup proceeds without validation errors
Progress notifications complete properly
No server crashes during shutdown

Dec 02 '24 17:12 donghao1393

Hi @dsp-ant , thank you for your advice. I have made commit. Would you help to check again?

Dec 17 '24 12:12 donghao1393

Technical Details

The issue occurred when internal notifications were received during cleanup
Added _closed flag to track session state accurately
Modified notification handling to be cleanup-aware
Ensures graceful shutdown even with pending notifications

Impact

Prevents validation errors during shutdown
Maintains session stability during cleanup
No interference with normal operation
Compatible with existing notification patterns

This approach keeps the fix focused on the core issue while maintaining backward compatibility.

Jan 03 '25 16:01 donghao1393

@dsp-ant @donghao1393 - if I may suggest a more straight-forward fix for this inline with codebase patterns? - https://github.com/txbm/python-sdk/commit/19185626d57fdd670219784d10859c7d0a302c2e

Jan 06 '25 06:01 txbm

@dsp-ant @donghao1393 - if I may suggest a more straight-forward fix for this inline with codebase patterns? - txbm@1918562

I have tested this commit on my local and it's working. I think this is a better solution. It fixed type definition for cancelled notifications to match protocol implementation. This allows proper handling of cancellation messages without requiring changes to the notification processing logic. What do you think @dsp-ant ?

Jan 06 '25 15:01 donghao1393

Please can you merge this fix in, it's breaking my server on claude desktop app - thank you!

Jan 07 '25 07:01 orliesaurus

Hey @donghao1393 - It seems to me that this can be simplified a lot now that there is the canceled notification types.

I'm going to be doing a loop on notifications in this sdk soon, and for my understanding (and to write some tests) I'd love it if you could also provide an example that reproduces this issue? would it just be a server with a long running task and the client sending of a cancel notification?

Yes, you're right. A reproduction case would be a server with a long-running task (like image generation) that typically takes more than a few seconds, and the client sending a cancel notification during the process. I can provide a minimal example:

add this of my mcp service
A client that initiates the task and sends a cancel notification, like:
- 1st prompt: "create an image using dall-e model to depict a dog standing on the moon watching the earth"
- 2nd prompt (in the same conversation): "create another image that the dog was running"
- expected: the conversation should return both 2 images
- problem: if the cancel notification made the service down in 1st prompt, the 2nd prompt would be failed to send request
inspect the log file mcp-server-openai.log so that we can observe the cancellation behavior

This would help validate both the current behavior and any proposed changes.

Jan 18 '25 16:01 donghao1393

Fixed on my side using this patch + extra glue on MCP 1.2.0. We need the cancellation notification. Because some time Claude is a bit wild and I noticed if I press cancel and a tool is triggered, it not working. We need to implement properly cancellation that would stop the tool in a graceful way and don't crash server.

Jan 20 '25 21:01 Mehdi-Bl

@dsp-ant @donghao1393 - if I may suggest a more straight-forward fix for this inline with codebase patterns? - txbm@1918562

I'm going to roll with this until the issue is closed, thanks @txbm!

Jan 29 '25 15:01 EduPonz

Hi @jerome3o-anthropic , I have updated accordingly and passed my test. Would you help to take a review for the next?

Feb 05 '25 17:02 donghao1393

I am curious if you could merge this? @dsp-ant Maybe it will fix my issues with The read operation timed out

Feb 11 '25 18:02 orliesaurus

It seem the issue is fixed, either Claude Desktop update or using the latest main branch. As already I see cancelled got added as valid type, so should no more throw panic error and crash.

Feb 11 '25 19:02 Mehdi-Bl

It seem the issue is fixed, either Claude Desktop update or using the latest main branch. As already I see cancelled got added as valid type, so should no more throw panic error and crash.

I have tested the latest SDK without this PR, but it failed in my case.

Feb 11 '25 19:02 donghao1393

I have used the main branch. It's not in 1.2.1. For me this seem solving it. https://github.com/modelcontextprotocol/python-sdk/blob/f10665db4c2f676da1131617ad67715952258712/src/mcp/types.py#L995 I testing 1 min task withtout issue when I moved using the new main branch. Previously would crash. I had even similar patch as your and removed it. Can you test the main branch not 1.2.1.

Feb 11 '25 20:02 Mehdi-Bl

I have used the main branch. It's not in 1.2.1. For me this seem solving it.

https://github.com/modelcontextprotocol/python-sdk/blob/f10665db4c2f676da1131617ad67715952258712/src/mcp/types.py#L995

I testing 1 min task withtout issue when I moved using the new main branch. Previously would crash. I had even similar patch as your and removed it. Can you test the main branch not 1.2.1.

Thank you for your advice. I tried with

# disable old mcp sdk
uv add git+https://github.com/modelcontextprotocol/python-sdk --branch main
# restart claude desktop

And then it works.

Feb 11 '25 20:02 donghao1393

I have used the main branch. It's not in 1.2.1. For me this seem solving it.

https://github.com/modelcontextprotocol/python-sdk/blob/f10665db4c2f676da1131617ad67715952258712/src/mcp/types.py#L995

I testing 1 min task withtout issue when I moved using the new main branch. Previously would crash. I had even similar patch as your and removed it. Can you test the main branch not 1.2.1.

if it's not in 1.2.1 is it going to be out in 1.2.2 or what's the logic you think? EDIT: Are they going straight to 1.3?

Feb 12 '25 00:02 orliesaurus

I'm not in the team. So @dsp-ant or @jerome3o-anthropic may help here. But this issue was a bigfpain for using MCP. When MCP crash mid conversation, you have to close Claude Desktop. And restart that step, or wait for it to timeout to let it see the issues. Some time it in the middle of modifications. Aside from the fact that some commands are impossible to run or blow out randomly when they reach the 10s. I rewrote my MCP more than once thinking the issue on my side. So it should had been higher priority and released in 1.2. or 1.2.1. I feel MCP SDK will remain not mature in a while as it's an ambitious angoing work.

Feb 12 '25 16:02 Mehdi-Bl