Variables shared across separate sessions in local mode
We've observed that, when working in local mode, variables defined in one session can persist and become accessible from another session — even though each session has its own workspace (cwd) and a unique data.xlsx file.
For example:
- Session A reads data.xlsx from its own workspace → works as expected.
- Session B, in a different cwd, attempts to read its own data.xlsx, but ends up reusing the data from Session A instead of loading fresh data from its file.
This suggests that data may be cached or that the interpreter is sharing state across sessions. Could the local mode be responsible for this unexpected persistence of variables or objects in memory?
We expect each session to be isolated, especially regarding loaded data and memory objects.
I'm wondering how you tell that the data in Session B is reused from Session A. In our design, different sessions are isolated in different processes. Could you share more details?
cc: @Jack-Q
Thanks for your reply! Here are a few more details :
- We're using TaskWeaver as a library, not through the CLI.
- When we create Session A, it has its own workspace directory with a unique data.xlsx file in it (in workspace/session_id/cwd). When we ask a question that leads the code interpreter to execute the following code, it returns the correct result:
import pandas as pd
# Load the dataset
data = pd.read_excel('data.xlsx')
# Identify unique tags
unique_tags = data['tags'].dropna().str.split(', ').explode().unique().tolist()
unique_tags
The issue appears when a new Session B is created, also with a different data.xlsx file in its own isolated workspace. When we ask the same question, the code interpreter re-executes the same code, but instead of loading and returning values from Session B’s file, it returns the values from Session A’s file.
This suggests that the "data" variable is maybe cached rather than being reloaded properly from the correct working directory.
We also tested this in container mode. We confirmed that separate containers are created for each session, but the same behavior persists: the data.xlsx from Session A appears to be used in Session B.
It depends on how you integrate TaskWeaver with your code. In default settings, if you are using CLI, every session has a unique session ID, so the folder for different sessions are different. Then, I'm not sure how that 'data.xlsx' file is placed into the folder. If you are using CLI or Web, TaskWeaver will help to place the file in the right place. But in your case, I can not fully interpret the exact process, so unable to explain why this happens.
Thank you for your feedback.
To clarify, for each session, we copy the data.xlsx file directly into the session’s cwd at initialization.
We also assign a custom session ID that matches the conversation ID.
For example, the workspace path looks like this: project\workspace\sessions\20250425-113301-f9f493eac31f400a9205cbabf48afc54\cwd\data.xlsx
Regarding code execution: In the code_interpreter.py, when we run the same code across two different sessions, the output from Session B is exactly the same as the first execution from Session A — despite the underlying data.xlsx files being different.
To investigate further, we added logging in taskweaver/ces/environnement.py, specifically inside the execute_code method.
We confirmed that:
- Each execution uses a different session_id, as expected.
- Each session has its own folder inside the workspace.
- In each cwd folder, there is a distinct data.xlsx file, with different contents, even though the filename remains the same.
- Execution result is the same.
If you need to look at specific parts of the code, we can provide them.
Thanks!
Hi,
We’ve investigated the issue further and were able to identify the root cause and implement a fix.
In the Environment class, we noticed that the _get_client method was using a shared cache for the kernel client:
if self._client is not None:
return self._client
This design meant that all sessions instantiated through the same Environment object could end up reusing the same kernel client, regardless of the session ID. As a result, file operations and variable persistence were leaking across sessions, which explained why Session B was accessing data from Session A.
So, we modified the code to use a per-session client cache, like so:
self._client_dict: Dict[str, BlockingKernelClient] = {}
def _get_client(
self,
session_id: str,
) -> BlockingKernelClient:
if session_id in self._client_dict:
return self._client_dict[session_id]
...
self._client_dict[session_id] = client
return client
def _clean_session_client(self, session_id: str):
if session_id in self._client_dict:
self._client_dict[session_id].stop_channels()
del self._client_dict[session_id]
This worked for us :)
Hi,
We’ve investigated the issue further and were able to identify the root cause and implement a fix.
In the Environment class, we noticed that the _get_client method was using a shared cache for the kernel client:
if self._client is not None: return self._clientThis design meant that all sessions instantiated through the same Environment object could end up reusing the same kernel client, regardless of the session ID. As a result, file operations and variable persistence were leaking across sessions, which explained why Session B was accessing data from Session A.
So, we modified the code to use a per-session client cache, like so:
self._client_dict: Dict[str, BlockingKernelClient] = {} def _get_client( self, session_id: str, ) -> BlockingKernelClient: if session_id in self._client_dict: return self._client_dict[session_id] ... self._client_dict[session_id] = client return client def _clean_session_client(self, session_id: str): if session_id in self._client_dict: self._client_dict[session_id].stop_channels() del self._client_dict[session_id]This worked for us :)
Thanks for the investigations. I've double confirmed this issue and your fix looks good to me. Will fix it.
Great thanks @lolapirespintopolarys. This has been fixed in our latest PR.