TaskWeaver Variables shared across separate sessions in local mode

We've observed that, when working in local mode, variables defined in one session can persist and become accessible from another session — even though each session has its own workspace (cwd) and a unique data.xlsx file.

For example:

Session A reads data.xlsx from its own workspace → works as expected.
Session B, in a different cwd, attempts to read its own data.xlsx, but ends up reusing the data from Session A instead of loading fresh data from its file.

This suggests that data may be cached or that the interpreter is sharing state across sessions. Could the local mode be responsible for this unexpected persistence of variables or objects in memory?

We expect each session to be isolated, especially regarding loaded data and memory objects.

Apr 23 '25 09:04 lolapirespintopolarys

I'm wondering how you tell that the data in Session B is reused from Session A. In our design, different sessions are isolated in different processes. Could you share more details?

cc: @Jack-Q

Apr 23 '25 10:04 liqul

Thanks for your reply! Here are a few more details :

We're using TaskWeaver as a library, not through the CLI.
When we create Session A, it has its own workspace directory with a unique data.xlsx file in it (in workspace/session_id/cwd). When we ask a question that leads the code interpreter to execute the following code, it returns the correct result:

import pandas as pd

# Load the dataset
data = pd.read_excel('data.xlsx')

# Identify unique tags
unique_tags = data['tags'].dropna().str.split(', ').explode().unique().tolist()

unique_tags

The issue appears when a new Session B is created, also with a different data.xlsx file in its own isolated workspace. When we ask the same question, the code interpreter re-executes the same code, but instead of loading and returning values from Session B’s file, it returns the values from Session A’s file.

This suggests that the "data" variable is maybe cached rather than being reloaded properly from the correct working directory.

We also tested this in container mode. We confirmed that separate containers are created for each session, but the same behavior persists: the data.xlsx from Session A appears to be used in Session B.

Apr 24 '25 13:04 lolapirespintopolarys

It depends on how you integrate TaskWeaver with your code. In default settings, if you are using CLI, every session has a unique session ID, so the folder for different sessions are different. Then, I'm not sure how that 'data.xlsx' file is placed into the folder. If you are using CLI or Web, TaskWeaver will help to place the file in the right place. But in your case, I can not fully interpret the exact process, so unable to explain why this happens.

Apr 25 '25 01:04 liqul

Thank you for your feedback.

To clarify, for each session, we copy the data.xlsx file directly into the session’s cwd at initialization.

We also assign a custom session ID that matches the conversation ID.

For example, the workspace path looks like this: project\workspace\sessions\20250425-113301-f9f493eac31f400a9205cbabf48afc54\cwd\data.xlsx

Regarding code execution: In the code_interpreter.py, when we run the same code across two different sessions, the output from Session B is exactly the same as the first execution from Session A — despite the underlying data.xlsx files being different.

To investigate further, we added logging in taskweaver/ces/environnement.py, specifically inside the execute_code method.

We confirmed that:

Each execution uses a different session_id, as expected.
Each session has its own folder inside the workspace.
In each cwd folder, there is a distinct data.xlsx file, with different contents, even though the filename remains the same.
Execution result is the same.

If you need to look at specific parts of the code, we can provide them.

Thanks!

Apr 25 '25 11:04 lolapirespintopolarys

Hi,

We’ve investigated the issue further and were able to identify the root cause and implement a fix.

In the Environment class, we noticed that the _get_client method was using a shared cache for the kernel client:

if self._client is not None:
    return self._client

This design meant that all sessions instantiated through the same Environment object could end up reusing the same kernel client, regardless of the session ID. As a result, file operations and variable persistence were leaking across sessions, which explained why Session B was accessing data from Session A.

So, we modified the code to use a per-session client cache, like so:

self._client_dict: Dict[str, BlockingKernelClient] = {}

def _get_client(
    self,
    session_id: str,
) -> BlockingKernelClient:
    if session_id in self._client_dict:
        return self._client_dict[session_id]
    ...
    self._client_dict[session_id] = client
    return client

def _clean_session_client(self, session_id: str):
    if session_id in self._client_dict:
        self._client_dict[session_id].stop_channels()
        del self._client_dict[session_id]

This worked for us :)

May 02 '25 07:05 lolapirespintopolarys

Hi,

We’ve investigated the issue further and were able to identify the root cause and implement a fix.

In the Environment class, we noticed that the _get_client method was using a shared cache for the kernel client:
if self._client is not None:
    return self._client
This design meant that all sessions instantiated through the same Environment object could end up reusing the same kernel client, regardless of the session ID. As a result, file operations and variable persistence were leaking across sessions, which explained why Session B was accessing data from Session A.

So, we modified the code to use a per-session client cache, like so:
self._client_dict: Dict[str, BlockingKernelClient] = {}

def _get_client(
    self,
    session_id: str,
) -> BlockingKernelClient:
    if session_id in self._client_dict:
        return self._client_dict[session_id]
    ...
    self._client_dict[session_id] = client
    return client

def _clean_session_client(self, session_id: str):
    if session_id in self._client_dict:
        self._client_dict[session_id].stop_channels()
        del self._client_dict[session_id]
This worked for us :)

Thanks for the investigations. I've double confirmed this issue and your fix looks good to me. Will fix it.

May 08 '25 10:05 liqul

Great thanks @lolapirespintopolarys. This has been fixed in our latest PR.

May 09 '25 03:05 liqul