fix: handle internal notifications during session cleanup
fix: handle internal notifications during session cleanup
Addresses an issue where internal notifications (e.g. 'cancelled') during session cleanup would trigger validation errors and crash the server.
Changes
- Add session state tracking via
_closedflag - Skip validation for internal notifications during cleanup phase
- Improve progress context cleanup with proper async context manager
- Add
final_progressnotification to ensure completion state is reported
Testing
Successfully tested with multiple concurrent requests with progress tracking:
- Cleanup proceeds without validation errors
- Progress notifications complete properly
- No server crashes during shutdown
Hi @dsp-ant , thank you for your advice. I have made commit. Would you help to check again?
Technical Details
- The issue occurred when internal notifications were received during cleanup
- Added _closed flag to track session state accurately
- Modified notification handling to be cleanup-aware
- Ensures graceful shutdown even with pending notifications
Impact
- Prevents validation errors during shutdown
- Maintains session stability during cleanup
- No interference with normal operation
- Compatible with existing notification patterns
This approach keeps the fix focused on the core issue while maintaining backward compatibility.
@dsp-ant @donghao1393 - if I may suggest a more straight-forward fix for this inline with codebase patterns? - https://github.com/txbm/python-sdk/commit/19185626d57fdd670219784d10859c7d0a302c2e
@dsp-ant @donghao1393 - if I may suggest a more straight-forward fix for this inline with codebase patterns? - txbm@1918562
I have tested this commit on my local and it's working. I think this is a better solution. It fixed type definition for cancelled notifications to match protocol implementation. This allows proper handling of cancellation messages without requiring changes to the notification processing logic. What do you think @dsp-ant ?
Please can you merge this fix in, it's breaking my server on claude desktop app - thank you!
Hey @donghao1393 - It seems to me that this can be simplified a lot now that there is the canceled notification types.
I'm going to be doing a loop on notifications in this sdk soon, and for my understanding (and to write some tests) I'd love it if you could also provide an example that reproduces this issue? would it just be a server with a long running task and the client sending of a cancel notification?
Yes, you're right. A reproduction case would be a server with a long-running task (like image generation) that typically takes more than a few seconds, and the client sending a cancel notification during the process. I can provide a minimal example:
- add this of my mcp service
- A client that initiates the task and sends a cancel notification, like:
- 1st prompt: "create an image using dall-e model to depict a dog standing on the moon watching the earth"
- 2nd prompt (in the same conversation): "create another image that the dog was running"
- expected: the conversation should return both 2 images
- problem: if the cancel notification made the service down in 1st prompt, the 2nd prompt would be failed to send request
- inspect the log file
mcp-server-openai.logso that we can observe the cancellation behavior
This would help validate both the current behavior and any proposed changes.
Fixed on my side using this patch + extra glue on MCP 1.2.0. We need the cancellation notification. Because some time Claude is a bit wild and I noticed if I press cancel and a tool is triggered, it not working. We need to implement properly cancellation that would stop the tool in a graceful way and don't crash server.
@dsp-ant @donghao1393 - if I may suggest a more straight-forward fix for this inline with codebase patterns? - txbm@1918562
I'm going to roll with this until the issue is closed, thanks @txbm!
Hi @jerome3o-anthropic , I have updated accordingly and passed my test. Would you help to take a review for the next?
I am curious if you could merge this? @dsp-ant
Maybe it will fix my issues with The read operation timed out
It seem the issue is fixed, either Claude Desktop update or using the latest main branch. As already I see cancelled got added as valid type, so should no more throw panic error and crash.
It seem the issue is fixed, either Claude Desktop update or using the latest main branch. As already I see cancelled got added as valid type, so should no more throw panic error and crash.
I have tested the latest SDK without this PR, but it failed in my case.
I have used the main branch. It's not in 1.2.1. For me this seem solving it. https://github.com/modelcontextprotocol/python-sdk/blob/f10665db4c2f676da1131617ad67715952258712/src/mcp/types.py#L995 I testing 1 min task withtout issue when I moved using the new main branch. Previously would crash. I had even similar patch as your and removed it. Can you test the main branch not 1.2.1.
I have used the main branch. It's not in 1.2.1. For me this seem solving it.
https://github.com/modelcontextprotocol/python-sdk/blob/f10665db4c2f676da1131617ad67715952258712/src/mcp/types.py#L995
I testing 1 min task withtout issue when I moved using the new main branch. Previously would crash. I had even similar patch as your and removed it. Can you test the main branch not 1.2.1.
Thank you for your advice. I tried with
# disable old mcp sdk
uv add git+https://github.com/modelcontextprotocol/python-sdk --branch main
# restart claude desktop
And then it works.
I have used the main branch. It's not in 1.2.1. For me this seem solving it.
https://github.com/modelcontextprotocol/python-sdk/blob/f10665db4c2f676da1131617ad67715952258712/src/mcp/types.py#L995
I testing 1 min task withtout issue when I moved using the new main branch. Previously would crash. I had even similar patch as your and removed it. Can you test the main branch not 1.2.1.
if it's not in 1.2.1 is it going to be out in 1.2.2 or what's the logic you think? EDIT: Are they going straight to 1.3?
I'm not in the team. So @dsp-ant or @jerome3o-anthropic may help here. But this issue was a bigfpain for using MCP. When MCP crash mid conversation, you have to close Claude Desktop. And restart that step, or wait for it to timeout to let it see the issues. Some time it in the middle of modifications. Aside from the fact that some commands are impossible to run or blow out randomly when they reach the 10s. I rewrote my MCP more than once thinking the issue on my side. So it should had been higher priority and released in 1.2. or 1.2.1. I feel MCP SDK will remain not mature in a while as it's an ambitious angoing work.