langchainjs icon indicating copy to clipboard operation
langchainjs copied to clipboard

Memory Leak Using ChatConversationalAgent

Open noseworthy opened this issue 1 year ago • 2 comments

Hello! I'm trying to use the chat conversational agent and after roughly 3 calls to my API (each request constructs a new agent) my backend server crashes due to running out of memory. It looks like the FinalizationRegistry isn't freeing the @dqbd/tiktoken encoder properly.

In order to test this, I yarn patched my local version of langchain to remove the registry and instead just create and free new tokenizers/encoders every time they're used. See below:

diff --git a/dist/base_language/index.cjs b/dist/base_language/index.cjs
index bb159aa2a7b3e6400d6d8d323150d07c49bf07fa..6ef29537e3ce36634ec6ccfd8cfba52cdbde0c6f 100644
--- a/dist/base_language/index.cjs
+++ b/dist/base_language/index.cjs
@@ -35,18 +35,6 @@ class BaseLanguageModel {
             writable: true,
             value: void 0
         });
-        Object.defineProperty(this, "_encoding", {
-            enumerable: true,
-            configurable: true,
-            writable: true,
-            value: void 0
-        });
-        Object.defineProperty(this, "_registry", {
-            enumerable: true,
-            configurable: true,
-            writable: true,
-            value: void 0
-        });
         this.verbose =
             params.verbose ?? (params.callbackManager ? true : getVerbosity());
         this.callbackManager = params.callbackManager ?? (0, index_js_1.getCallbackManager)();
@@ -56,23 +44,12 @@ class BaseLanguageModel {
         // fallback to approximate calculation if tiktoken is not available
         let numTokens = Math.ceil(text.length / 4);
         try {
-            if (!this._encoding) {
-                const { encoding_for_model } = await (0, count_tokens_js_1.importTiktoken)();
-                // modelName only exists in openai subclasses, but tiktoken only supports
-                // openai tokenisers anyway, so for other subclasses we default to gpt2
-                if (encoding_for_model) {
-                    this._encoding = encoding_for_model("modelName" in this
-                        ? (0, count_tokens_js_1.getModelNameForTiktoken)(this.modelName)
-                        : "gpt2");
-                    // We need to register a finalizer to free the tokenizer when the
-                    // model is garbage collected.
-                    this._registry = new FinalizationRegistry((t) => t.free());
-                    this._registry.register(this, this._encoding);
-                }
-            }
-            if (this._encoding) {
-                numTokens = this._encoding.encode(text).length;
-            }
+            const { encoding_for_model } = await (0, count_tokens_js_1.importTiktoken)();
+            const encoding = encoding_for_model("modelName" in this
+                    ? (0, count_tokens_js_1.getModelNameForTiktoken)(this.modelName)
+                    : "gpt2");
+            numTokens = encoding.encode(text).length;
+            encoding.free();
         }
         catch (error) {
             console.warn("Failed to calculate number of tokens with tiktoken, falling back to approximate count", error);
diff --git a/dist/base_language/index.js b/dist/base_language/index.js
index e7f422f0ab5f3438b2eb99b2b1431f482cb34ed2..680ebe81ca28c20b1a9465932674b813cc55d46d 100644
--- a/dist/base_language/index.js
+++ b/dist/base_language/index.js
@@ -32,18 +32,6 @@ export class BaseLanguageModel {
             writable: true,
             value: void 0
         });
-        Object.defineProperty(this, "_encoding", {
-            enumerable: true,
-            configurable: true,
-            writable: true,
-            value: void 0
-        });
-        Object.defineProperty(this, "_registry", {
-            enumerable: true,
-            configurable: true,
-            writable: true,
-            value: void 0
-        });
         this.verbose =
             params.verbose ?? (params.callbackManager ? true : getVerbosity());
         this.callbackManager = params.callbackManager ?? getCallbackManager();
@@ -53,22 +41,16 @@ export class BaseLanguageModel {
         // fallback to approximate calculation if tiktoken is not available
         let numTokens = Math.ceil(text.length / 4);
         try {
-            if (!this._encoding) {
-                const { encoding_for_model } = await importTiktoken();
-                // modelName only exists in openai subclasses, but tiktoken only supports
-                // openai tokenisers anyway, so for other subclasses we default to gpt2
-                if (encoding_for_model) {
-                    this._encoding = encoding_for_model("modelName" in this
-                        ? getModelNameForTiktoken(this.modelName)
-                        : "gpt2");
-                    // We need to register a finalizer to free the tokenizer when the
-                    // model is garbage collected.
-                    this._registry = new FinalizationRegistry((t) => t.free());
-                    this._registry.register(this, this._encoding);
-                }
-            }
-            if (this._encoding) {
-                numTokens = this._encoding.encode(text).length;
+            const { encoding_for_model } = await importTiktoken();
+            // modelName only exists in openai subclasses, but tiktoken only supports
+            // openai tokenisers anyway, so for other subclasses we default to gpt2
+            if (encoding_for_model) {
+                const encoding = encoding_for_model("modelName" in this
+                    ? getModelNameForTiktoken(this.modelName)
+                    : "gpt2");
+
+                numTokens = encoding.encode(text).length;
+                encoding.free();
             }
         }
         catch (error) {
diff --git a/dist/text_splitter.cjs b/dist/text_splitter.cjs
index a8df204ff646b96486b346bf192f6d818a7c4d72..a2c3186bd898f23852d718616d8ff0012cbde656 100644
--- a/dist/text_splitter.cjs
+++ b/dist/text_splitter.cjs
@@ -22,7 +22,7 @@ class TextSplitter {
             throw new Error("Cannot have chunkOverlap >= chunkSize");
         }
     }
-    async createDocuments(texts, 
+    async createDocuments(texts,
     // eslint-disable-next-line @typescript-eslint/no-explicit-any
     metadatas = []) {
         const _metadatas = metadatas.length > 0 ? metadatas : new Array(texts.length).fill({});
@@ -215,31 +215,13 @@ class TokenTextSplitter extends TextSplitter {
             writable: true,
             value: void 0
         });
-        Object.defineProperty(this, "tokenizer", {
-            enumerable: true,
-            configurable: true,
-            writable: true,
-            value: void 0
-        });
-        Object.defineProperty(this, "registry", {
-            enumerable: true,
-            configurable: true,
-            writable: true,
-            value: void 0
-        });
         this.encodingName = fields?.encodingName ?? "gpt2";
         this.allowedSpecial = fields?.allowedSpecial ?? [];
         this.disallowedSpecial = fields?.disallowedSpecial ?? "all";
     }
     async splitText(text) {
-        if (!this.tokenizer) {
-            const tiktoken = await TokenTextSplitter.imports();
-            this.tokenizer = tiktoken.get_encoding(this.encodingName);
-            // We need to register a finalizer to free the tokenizer when the
-            // splitter is garbage collected.
-            this.registry = new FinalizationRegistry((t) => t.free());
-            this.registry.register(this, this.tokenizer);
-        }
+        const tiktoken = await TokenTextSplitter.imports();
+        const tokenizer = tiktoken.get_encoding(this.encodingName);
         const splits = [];
         const input_ids = this.tokenizer.encode(text, this.allowedSpecial, this.disallowedSpecial);
         let start_idx = 0;
@@ -252,6 +234,8 @@ class TokenTextSplitter extends TextSplitter {
             cur_idx = Math.min(start_idx + this.chunkSize, input_ids.length);
             chunk_ids = input_ids.slice(start_idx, cur_idx);
         }
+
+        tokenizer.free();
         return splits;
     }
     static async imports() {
diff --git a/dist/text_splitter.js b/dist/text_splitter.js
index 13b971d544d8012fa8f487556795bdd6424fe0ef..7215b0147b8b671c740d7f9a2e44e6c26557d9ab 100644
--- a/dist/text_splitter.js
+++ b/dist/text_splitter.js
@@ -19,7 +19,7 @@ export class TextSplitter {
             throw new Error("Cannot have chunkOverlap >= chunkSize");
         }
     }
-    async createDocuments(texts, 
+    async createDocuments(texts,
     // eslint-disable-next-line @typescript-eslint/no-explicit-any
     metadatas = []) {
         const _metadatas = metadatas.length > 0 ? metadatas : new Array(texts.length).fill({});
@@ -209,31 +209,13 @@ export class TokenTextSplitter extends TextSplitter {
             writable: true,
             value: void 0
         });
-        Object.defineProperty(this, "tokenizer", {
-            enumerable: true,
-            configurable: true,
-            writable: true,
-            value: void 0
-        });
-        Object.defineProperty(this, "registry", {
-            enumerable: true,
-            configurable: true,
-            writable: true,
-            value: void 0
-        });
         this.encodingName = fields?.encodingName ?? "gpt2";
         this.allowedSpecial = fields?.allowedSpecial ?? [];
         this.disallowedSpecial = fields?.disallowedSpecial ?? "all";
     }
     async splitText(text) {
-        if (!this.tokenizer) {
-            const tiktoken = await TokenTextSplitter.imports();
-            this.tokenizer = tiktoken.get_encoding(this.encodingName);
-            // We need to register a finalizer to free the tokenizer when the
-            // splitter is garbage collected.
-            this.registry = new FinalizationRegistry((t) => t.free());
-            this.registry.register(this, this.tokenizer);
-        }
+        const tiktoken = await TokenTextSplitter.imports();
+        const tokenizer = tiktoken.get_encoding(this.encodingName);
         const splits = [];
         const input_ids = this.tokenizer.encode(text, this.allowedSpecial, this.disallowedSpecial);
         let start_idx = 0;
@@ -246,6 +228,7 @@ export class TokenTextSplitter extends TextSplitter {
             cur_idx = Math.min(start_idx + this.chunkSize, input_ids.length);
             chunk_ids = input_ids.slice(start_idx, cur_idx);
         }
+        tokenizer.free();
         return splits;
     }
     static async imports() {

After this patch, memory is freed by the garbage collector as expected.

In my code I'm using https://github.com/ceifa/tiktoken-node to interface with tiktoken as it doesn't require me to manage memory in my JavaScript code.

I'm not sure why the finalizer isn't running for me (I'm not super familiar with that API, but the MDN docs say it can be a bit flaky)

Figured i'd let you know. I'm not sure how you want to handle this, but figured I'd let you know! I can throw together a PR to do the patch I did above in the actual source if you'd like. But understand if it's just a "me" problem, and you don't want to lose the caching of the encoders.

Thanks for all your work on this library!

noseworthy avatar Apr 24 '23 13:04 noseworthy

It would be helpful if you included the error message you're getting so others with the same issue can find it. I'm getting an out of memory error too, is this what you're seeing?

node:internal/event_target:1036
  process.nextTick(() => { throw err; });
                           ^
RangeError [Error]: WebAssembly.instantiate(): Out of memory: Cannot allocate Wasm memory for new instance
Emitted 'error' event on InternalWorker instance at:
    at [kOnErrorMessage] (node:internal/worker:323:10)
    at [kOnMessage] (node:internal/worker:334:37)
    at MessagePort.<anonymous> (node:internal/worker:229:57)
    at [nodejs.internal.kHybridDispatch] (node:internal/event_target:761:20)
    at exports.emitMessage (node:internal/per_context/messageport:23:28)

eumemic avatar Apr 25 '23 14:04 eumemic

Thanks for the error message @eumemic ! I never had one, we're only running in docker containers and an error message never got logged before the container was killed and destroyed.

noseworthy avatar Apr 25 '23 17:04 noseworthy

Hi, @noseworthy! I'm here to help the LangChain team manage their backlog and I wanted to let you know that we are marking this issue as stale.

Based on my understanding, you reported an issue regarding a memory leak in the chat conversational agent. It seems that the backend server crashes after a few API calls due to running out of memory. Another user, @eumemic, suggested including the error message to help others facing the same issue. You mentioned that you never received an error message before the container was killed and destroyed.

Before we proceed, we would like to confirm if this issue is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain project!

dosubot[bot] avatar Aug 19 '23 16:08 dosubot[bot]