langflow icon indicating copy to clipboard operation
langflow copied to clipboard

Infinite file size ballooning, memory leaks, and crashes being caused by langflow.db

Open EDLLT opened this issue 1 year ago • 30 comments

Related separate issue: https://github.com/langflow-ai/langflow/issues/3970

Bug Description

Whenever I try to embed my 70MB text file, it overtime starts eating up all my RAM. Then even booting langflow begins eating up ram, I am assuming because of that 70MB file

Initially, I was attempting to perform RAG and noticed my system lagging heavily then I narrowed it down to the file component

Reproduction

Before installation, I created a python venv using

python3 -m venv langflow
source langflow/bin/activate

Then I installed langflow while in that environment

pip install langflow -U

Steps to reproduce

Please see both of these links https://github.com/langflow-ai/langflow/issues/4008#issuecomment-2399700910 https://github.com/langflow-ai/langflow/issues/4008#issuecomment-2407550796

Who can help?

No response

Operating System

OS: Ubuntu 24.04.1 LTS x86_64

Langflow Version

Langflow v1.0.18

Python Version

3.12

Screenshot

image

Flow File

No response

EDLLT avatar Oct 03 '24 16:10 EDLLT

Hey, @EDLLT! I'm here to help you with bugs, questions, and becoming a contributor. Let's tackle this issue together while we wait for a human maintainer.

The issue you're experiencing with memory consumption when repeatedly building the "file" component in Langflow could be related to the caching mechanism. Langflow uses various caching services, such as AsyncInMemoryCache, ThreadingInMemoryCache, and RedisCache, which might not be properly releasing memory when handling large files like your 70MB text file. This can lead to increased memory usage each time the component is rebuilt [1][2].

To mitigate this issue, you might want to check the configuration of the caching services to ensure they are set up to efficiently manage memory. Additionally, consider clearing the cache manually or adjusting the cache settings to prevent excessive memory consumption. If the problem persists, it might be necessary to look into the specific implementation of the FileComponent to identify any potential memory leaks or inefficiencies in memory management.

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 03 '24 16:10 dosubot[bot]

Through further experimentation, I discovered that even uploading is causing a memory leak. If you simply upload the same file multiple times, you'll notice the memory usage increasing

EDLLT avatar Oct 07 '24 07:10 EDLLT

I've tested running this multiple times while the other parts are commented out

data = parse_text_file_to_data(resolved_path, silent_errors)

It wasn't causing any memory leaks(after loading it, memory returned back to the exact same spot)

However, this was problematic

self.status = data if data else "No data"

It seems that this line is causing a memory leak as every time I rebuild the component with that line, memory doesn't seem to get back but rather increases on every build

Seems like this is also problematic as it's increasing memory usage on rebuild

return data or Data()

EDLLT avatar Oct 07 '24 08:10 EDLLT

Figured out something very problematic. The langflow.db file is storing all every component's output, it seems

image

EDLLT avatar Oct 07 '24 16:10 EDLLT

Seems like this file plays a huge role

langflow_source-code/src/backend/base/langflow/services/database/models/vertex_builds/crud.py It commits to the db the output results of the component, I am assuming for caching upon rebuilds. The problem is, it doesn't really cache properly as it caches the same file's content over and over again then committing to the db making it balloon

Found more. What's the purpose of logging every vertex build? As it seems to be accumulating and taking up tremendous amounts of storage space in the db file causing langflow to crash

langflow_source-code/src/backend/base/langflow/graph/utils.py

def log_vertex_build(
    flow_id: str,
    vertex_id: str,
    valid: bool,
    params: Any,
    data: ResultDataResponse,
    artifacts: dict | None = None,
):
    try:
        if not get_settings_service().settings.vertex_builds_storage_enabled:
            return
        vertex_build = VertexBuildBase(
            flow_id=flow_id,
            id=vertex_id,
            valid=valid,
            params=str(params) if params else None,
            # ugly hack to get the model dump with weird datatypes
            data=json.loads(data.model_dump_json()),
            # ugly hack to get the model dump with weird datatypes
            artifacts=json.loads(json.dumps(artifacts, default=str)),
        )
        with session_getter(get_db_service()) as session:
            inserted = crud_log_vertex_build(session, vertex_build)
            logger.debug(f"Logged vertex build: {inserted.build_id}")
    except Exception as e:
        logger.exception(f"Error logging vertex build: {e}")

edit: Okay, it seems like its purpose is to cache results. The problem with it is that it doesn't clear up the previous vertex builds once the component's been rebuilt There also needs to be a size limit for when to cache and when to not cache as langflow crashes with large ones

EDLLT avatar Oct 07 '24 17:10 EDLLT

Found more. What's the purpose of logging every vertex build? As it seems to be accumulating and taking up tremendous amounts of storage space in the db file causing langflow to crash

@nicoloboschi

EDLLT avatar Oct 07 '24 18:10 EDLLT

Hey @EDLLT, thanks for providing this step-by-step to reproduce the issue! I observed the same issue when processing huge volume of data. Looks like in both vertex_build table and transaction table both are logging the input and output of each component. so if you are logging 70MB worth of text in one column, this will make the consumption of that table impossible. then if you delete those tables, everything will resume working, but you loose the metadata stored in langflow DB. Hey @Cristhianzl can you please help with this issue?

codenprogressive avatar Oct 08 '24 00:10 codenprogressive

hey @codenprogressive, @EDLLT how are you?

We have a fix for this error coming soon in the next release. The front end can now handle this amount of data without breaking, as it did for you, @EDLLT.

The point is that we can't truncate the data to save it in the database because it would break other features like the freeze/freeze path. We need to save the runs on vertex_build and transaction tables for these features to work properly.

So my advice is: when you're working with large files or large amounts of data, please try to use these features as well. This way, the file or large data won't be processed twice. :)

Cristhianzl avatar Oct 08 '24 01:10 Cristhianzl

I'm closing this issue because the error has been fixed on the MAIN branch and will be included in the next Langflow release!

Thank you!!

Cristhianzl avatar Oct 08 '24 01:10 Cristhianzl

@Cristhianzl Hey! Thanks for the prompt fix.

I have tested it out and unfortunately the issues that I had mentioned in previous comments still occur and new issues that arise from the frontend

In these, I talk about the file component but I think the same issue occurs to every component that deals with data

Issues that still occur

Fixed issues
  • When processing a large amount of data using the file component
    • After it has built successfully, we are able to press on Data within langflow to view the contents of our file.
    • Then, when I try to refresh the page and try to view the data within my already built file component, it crashes my tab(tested on both brave and chrome)
    • image
    • image

EDLLT avatar Oct 08 '24 12:10 EDLLT

@EDLLT hi,

Are you using the freeze/freeze path feature to prevent reprocessing files that have already been processed? Note that if you don't use these features, the database will be increased each time you run the flow.

Are you on the main branch locally? Please note that we haven't released the fix for this yet.

Cristhianzl avatar Oct 08 '24 13:10 Cristhianzl

Are you using the freeze/freeze path feature to prevent reprocessing files that have already been processed? Note that if you don't use these features, the database will be increased each time you run the flow.

Yes, I have tried using the freeze feature which still resulted in the same problems

Are you on the main branch locally? Please note that we haven't released the fix for this yet.

Yes, I am building langflow from the source code's main branch, commit bffb0f129bc61bacc57ec2591d3e6525e3088b93

EDLLT avatar Oct 08 '24 13:10 EDLLT

I'll try to get to the bottom of this. Meanwhile, could this issue be reopened?

Here's a useful patch I've written for debugging purposes. It helps show how many vertex builds there are, when they get returned and when they are getting committed to the database

Here's a patch to crud.py to make it more verbose for debugging.

verbose-vertex-crud-py.patch

Example of what it outputs when building components/refreshing the page

When building a component, the data gets committed into the db

/usr/lib/python3.12/asyncio/base_events.py:726: ResourceWarning: unclosed event loop <_UnixSelectorEventLoop running=False closed=False debug=False>
  _warn(f"unclosed event loop {self!r}", ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
log_vertex_build called with vertex_build id: File-osnlJ
Vertex build data size: 1816 bytes
Created VertexBuildTable with id: File-osnlJ
Table contents:
  timestamp: 2024-10-08T15:05:12.996697+00:00
  id: File-osnlJ
  data: {"results": {}, "outputs": {"data": {"message": {"file_path": "/home/edllt/.cache/langflow/b2e77365-53e4-4f56-80a9-7d51a553913d/2024-10-08_18-01-39_random_text.txt"...
  artifacts: {"data": {"rep...
  params: None
  valid: True
  flow_id: b2e77365-53e4-4f56-80a9-7d51a553913d
  build_id: dc9903b9-30d9-4fb5-97fa-b681287b097f
Successfully committed VertexBuildTable with id: File-osnlJ
log_vertex_build finished

Upon refreshing, it seems like all vertex builds are being returned

get_vertex_builds_by_flow_id called with flow_id: b2e77365-53e4-4f56-80a9-7d51a553913d, limit: 1000
Returning 3 vertex builds

Using this script, as well as using sql commands to view the db, I figured that the page crash is probably because it's returning a huge amount of data upon refreshing. That along with the fact that the previously built components' cached outputs in the db don't get cleared up makes it take up a lot of storage and ram

EDLLT avatar Oct 08 '24 15:10 EDLLT

@EDLLT

We are verifying if there is any issue with the freeze/freeze path feature, and we are going to fix It before the next release. I will also check into the frontend crashing.

I'll reopen the issue. Thanks again!

Cristhianzl avatar Oct 08 '24 16:10 Cristhianzl

hi @codenprogressive @EDLLT

We have confirmed that the freeze feature is functioning as expected. I will implement a fix to optimize data storage in the database, which will reduce memory usage and prevent potential frontend crashes.

Thanks for your feedback and patience. Feel free to contact us anytime :)

#4078

Cristhianzl avatar Oct 09 '24 12:10 Cristhianzl

Issues that still occur

  • Deleting the entire flow clears up the relevant parts of the vertex_build cache; however, deleting the file component does not delete its relevant data from the vertex_build cache. This causes langflow.db to pile up space from non-existing components overtime within the flow

  • When rebuilding the file component explicitly(doesn't matter if it's frozen or not), the previously cached data does not get removed from vertex_build in langflow.db and the data gets duplicated within vertex_build.

  • Uploading the same file multiple times using langflow's file component increases memory usage. In my case, uploading the random_ascii_70MB.txt 10 times increased memory usage by ~1GB

New issues

  • When processing a large amount of data using the file component

    • After it has built successfully, we are able to press on Data within langflow to view the contents of our file.
    • Then, when I try to refresh the page and try to view the data within my already built file component, it crashes my tab(tested on both brave and chrome)
    • image
    • image
  • Taking data from the file component then using split text processes successfully but

    • Upon reloading the page(on the flow not the main menu), langflow starts taking up a significant amount of ram(reached up to ~15GB) before it settles down and afterwards, the browser's memory starts spiking up to 2~3GB before crashing the tab(tested on both brave and chrome)

@Cristhianzl I haven't tested the fix yet but looking at the PR, it seems that it only addressed the frontend crashing issue through truncating long strings.

Other issues like data duplication in the database from rebuilds, dead data in the database belonging to non-existing components when deleting them, uploading file taking memory upon each reupload, etc(i have written all the issues in the previous comment) don't seem to have been addressed

EDLLT avatar Oct 10 '24 04:10 EDLLT

@EDLLT, hi

We are concerned about this and are close to the point where we will no longer use the vertex_build table. I hope that within the next few weeks, we can disable this table and eliminate data duplication.

Truncating the data stored in the database (PR #4078) will significantly reduce memory usage, prevent frontend crashes, and lower storage requirements.

Cristhianzl avatar Oct 10 '24 13:10 Cristhianzl

@Cristhianzl Hey Apologies for continuing to bother you about this.

I have tested the main branch with the latest commit which includes your recently merged PR. It seems to not crash upon refreshing during the first component's build anymore; however, the frontend still ends up spiking memory and ultimately crashing when refreshing the page after building the component more than once even if the components were frozen.

(This occurs due to the previously mentioned data duplication issue in vertex_build not taking freezing into account and not clearing up previously built component's data as mentioned earlier. Manually deleting all entries from the vertex_build table stops the crashing)

Also, may I request for this issue to stay open up until all the database issues including the ones I had mentioned earlier are fixed? As they all seem to play a role in this

edit:

How to reproduce the crash For this example, we could generate a 70MB file containing random characters with new lines(in my case, I tried using my own real data but this file should reproduce the same issue)

tr -cd '[:print:]\n' < /dev/urandom | head -c 70000000 > random_ascii_with_newlines.txt

Upload it to the file component Get the text split component then split it using the default values(Chunk size 1000, Chunk overlap 200) image

Build split text component Refresh the page No crash

Build split text component again(doesn't matter if you freeze them or not) Refresh page Langflow crashes image

EDLLT avatar Oct 11 '24 14:10 EDLLT

I've edited this comment to include steps to reproduce the crash https://github.com/langflow-ai/langflow/issues/4008#issuecomment-2407550796

EDLLT avatar Oct 11 '24 17:10 EDLLT

hey @EDLLT you are correct. I found the problem. Somehow after the first run, the params column is been stored with memory heap, causing the cascade error to the frontend.

There’s no problem with the freeze feature. It operates using a cache table in memory, not the vertex_build table.

We are still working on the improvement to remove this vertex_build table, this could take a while but, in the next weeks we are not going to have this table anymore.

After lots of runs of this flow (this is a 50MB CSV file uploaded), the memory heap is not happening anymore :) This field was not been truncated as the others.

image

image

My suggestion now is to clean up the vertex_build and transactions tables. Once you do that, the data will be truncated and displayed accordingly on the frontend, which should prevent any memory leaks or frontend crashes.

I would like to thank you for your patience and help! Really appreciated!

PR: #4118

Cristhianzl avatar Oct 11 '24 17:10 Cristhianzl

I will let this issue open until you confirm everything is working for you.

Cristhianzl avatar Oct 11 '24 18:10 Cristhianzl

@Cristhianzl Unfortunately langflow still crashes on the latest commit d0fdc568902990142762a4ca2de3c50ca6976c28 (Btw, the first thing I had done was to clear up my transaction and vertex_build tables, create a new flow, and make sure that I am on the latest commit d0fdc568902990142762a4ca2de3c50ca6976c28)

sqlite> delete from "transaction";
sqlite> delete from vertex_build;

The steps to reproduce it remain the same https://github.com/langflow-ai/langflow/issues/4008#issuecomment-2407550796

Also here's the random ascii file and flow(hopefully would help with reproducability)

The flow Langflow Crasher.json

The ASCII 70MB file https://drive.google.com/file/d/1WBp0LoZiPqCBc4IhdabtV___7ZqPaV0a/view?usp=sharing

EDLLT avatar Oct 11 '24 19:10 EDLLT

@EDLLT how are you running the Langflow?

Cristhianzl avatar Oct 11 '24 19:10 Cristhianzl

@EDLLT how are you running the Langflow?

Within langflow's github source code on the main branch

source .venv/bin/activate on both tabs

make backend on one terminal tab make frontend on another terminal tab

EDLLT avatar Oct 11 '24 19:10 EDLLT

hi @EDLLT,

You are absolutely correct about the error you're reporting. I'm following the steps you provided to reproduce it.

I have a solution that would temporarily fix this error. However, the real solution will be to remove the vertex_build table. So, we are going to wait until we reach the point where we can remove this table to fully resolve the problem.

For now, I ask for your patience. Thank you!

Cristhianzl avatar Oct 14 '24 13:10 Cristhianzl

Seems like i do have similar issue, also my ram usage is growing as sqlite file - now its 7GB... version of my langflow is 1.0.18 and its ran by python3.10...

and when memory ran out: --- End of logging error --- ERROR 2024-11-13 18:09:20 - ERROR - utils utils.py:159 - Error logging transaction: (sqlite3.OperationalError) disk I/O error (Background on this error at: https://sqlalche.me/e/20/e3q8) --- Logging error in Loguru Handler #4 --- Record was: {'elapsed': datetime.timedelta(seconds=23076, microseconds=901699), 'exception': None, 'extra': {}, 'file': (name='utils.py', path='/home/langflow/.local/lib/python3.10/site-packages/langflow/graph/utils.py'), 'function': 'log_transaction', 'level': (name='ERROR', no=40, icon='❌'), 'line': 159, 'message': 'Error logging transaction: (sqlite3.OperationalError) disk I/O error\n(Background on this error at: https://sqlalche.me/e/20/e3q8)', 'module': 'utils', 'name': 'langflow.graph.utils', 'process': (id=3284, name='MainProcess'), 'thread': (id=139889603801088, name='MainThread'), 'time': datetime(2024, 11, 13, 18, 9, 20, 204206, tzinfo=datetime.timezone(datetime.timedelta(seconds=3600), 'CET'))} Traceback (most recent call last): File "/home/langflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1144, in _commit_impl self.engine.dialect.do_commit(self.connection) File "/home/langflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 702, in do_commit dbapi_connection.commit() sqlite3.OperationalError: disk I/O error

i am using API to connect to langflow, the flow is pretty large - yet things like that should not really occur.

i am also using 8 workers.

no file uploads, just simple input texts. not large, but a lot of API queries.

severfire avatar Nov 13 '24 19:11 severfire

I have a solution that would temporarily fix this error. However, the real solution will be to remove the vertex_build table. So, we are going to wait until we reach the point where we can remove this table to fully resolve the problem.

If removing vertex build is not possible or will take significant effort, another solution I could think of would be to add checks to prevent duplicates of the same data as well as remove old cached values when adding a new value. I think this solution is simpler; however, my understanding of vertex_build is not full yet. So, did I misunderstand or will this approach work? Also, if I submit a PR doing this, will it be merged?

EDLLT avatar Nov 26 '24 08:11 EDLLT

@EDLLT hi!

If it's a simple task, feel free to take it on—any help is greatly appreciated! We’re excited to collaborate with other engineers here to find the best possible solution.

Please make sure to tag this issue in the PR so we can provide context and keep everyone informed about the work we’re doing.

Thanks!

Cristhianzl avatar Dec 02 '24 12:12 Cristhianzl

@Cristhianzl I've noticed that this issue's been closed. Has it been fixed? Which PR fixes the issue if so?

EDLLT avatar Dec 17 '24 12:12 EDLLT

I'll keep It open. Just closed due lack of new messages.

Thanks!

Cristhianzl avatar Dec 17 '24 13:12 Cristhianzl