Build with a lot of data can cause failure to load pages
Bug Description
If I run a step which loads a massive amount of data (for example Directory loader) then try to reload the page, it would try to re-fetch all of that data and almost always cause failure
Reproduction
- Use Directory loader on a directory with 500+ files
- Build the component successfully
- Reload the page
- It would crash the page
Expected behavior
Page should load successfully
Who can help?
No response
Operating System
macOS 14
Langflow Version
1.0.18
Python Version
3.12
Screenshot
No response
Flow File
No response
Hey @ShitizZip! 👋 I'm here to help you with any bugs, questions, or contributions. Let's work together to solve this issue.
To address the issue of page failure when reloading after loading a large dataset with the Directory loader, consider the following approaches:
-
Batch Loading: Use the
batch_sizeparameter to limit the amount of data processed at once. This can help manage memory usage and prevent crashes during reloads [1]. -
Caching: Ensure that caching mechanisms like
AsyncInMemoryCacheorThreadingInMemoryCacheare properly configured to handle large data loads efficiently. This can reduce the need to re-fetch all data on reload [2]. -
Multithreading: Verify that the
use_multithreadingoption is enabled to improve performance by loading files concurrently [3].
These strategies can help mitigate the issue and improve page load stability.
To continue talking to Dosu, mention @dosu.
In case someone is also having the issue, a quick fix is to just TRUNCATE vertex_build; and reload.
If I well understand vertex_build table got corrupted when processing huge volume of data! I observed a similar issue.
In case someone is also having the issue, a quick fix is to just
TRUNCATE vertex_build;and reload.
Am I correct to assume that this is to be run for some sql file? Where can I find that sql file?
@dosu What did ShitizZip mean by this?
[ShitizZip](https://github.com/ShitizZip) commented [last week](https://github.com/langflow-ai/langflow/issues/3970#issuecomment-2384629841)
In case someone is also having the issue, a quick fix is to just TRUNCATE vertex_build; and reload.
When @ShitizZip mentioned "TRUNCATE vertex_build; and reload," they were suggesting a quick fix to address the issue of page failure when reloading after processing a large dataset. The TRUNCATE command is used in SQL to remove all rows from a table, in this case, the vertex_build table, which logs information about the building of vertices within a flow. By truncating this table, you effectively clear all the logged data, which might help in resolving issues related to large data volumes causing page crashes.
This command is typically run in a SQL environment connected to your database, not from a specific SQL file. You would execute it using a database management tool or command-line interface that allows you to run SQL commands directly on your database.
To continue talking to Dosu, mention @dosu.
@dosu Where does langflow store its configuration/flow/... user data?
Edit: Seems like I found it
langflow_source-code/src/backend/base/langflow/langflow.db
What I had to do to clear it up
sqlite3 langflow_source-code/src/backend/base/langflow/langflow.db
DELETE FROM vertex_build;
DELETE FROM "transaction";
.exit
To check if anything's left in it(Should be nothing)
sqlite3 langflow_source-code/src/backend/base/langflow/langflow.db
SELECT * FROM vertex_build;
SELECT * FROM "transaction";
.exit
@ShitizZip
Thank you for the feedback. Can you check if the issue persists in the latest version of langflow version 1.1.1 ?
Happy to help if the issue still occurs.
Hey @EDLLT! 😊 I'm here to help with bugs, technical questions, and engineering knowledge. However, diving into where Langflow stores its configuration or user data isn't something I can assist with directly. If you think I should be able to help with this, feel free to reach out to the Dosu Team. They'll be happy to hear your thoughts!
To continue talking to Dosu, mention @dosu.
Hi @carlosrcoelho the issue is still exist in 1.1.1
Should be fixed by https://github.com/langflow-ai/langflow/pull/5658. @EDLLT @codenprogressive Can you check ?
Should be fixed by #5658. @EDLLT @codenprogressive Can you check ?
I tested it, the split text component no longer builds successfully(stating "DataError: (sqlite3.DataError) string or blob too big ") before timing out and ultimately failing.
This is, however, different from this issue(as it involves the split text component), I will be shortly testing the directory component as well.
Since it's a different issue, I'm closing this one. Can you please open another issue ?