Bug:Running Incremental Model For nearly above half a millions of data
My pipeline was working fine for a month until the data volume reaches to like 50 million of rows. It then throws an error lilke this:
what(): {"exception_type":"INTERNAL","exception_message":"Attempted to access index 9223372036854775807 within vector of size 3"}
I first of all thought this error could be beacuse of the duckdb side but no.
There is no error when I am running the model with a full-refresh mode. But as soon as I run an incremental model I am encountered with such kind of error.
@g-diwakar , do you still experience this error on the latest version of DuckDB?
Hi, I'm interested in contributing to this issue as my open-source contribution. I’ve tested a similar pipeline and was able to reproduce the error when running incremental models with a large dataset.
I'd like to help by:
- Trying to create a minimal reproducible example
- Exploring possible causes in the incremental model or vector indexing code
- Testing and suggesting potential fixes
Please let me know if I can take this up or if there’s any guidance on where to start in the codebase. Thanks!
Minimal reproducible example is always the best place to start-- thanks @Dinesh-0813 !
can i work on this or not !
@Dinesh-0813 yes you can!! Please let us help you though, if you can make a minimal repro we can help point you in the right direction