dbt-duckdb icon indicating copy to clipboard operation
dbt-duckdb copied to clipboard

Bug:Running Incremental Model For nearly above half a millions of data

Open g-diwakar opened this issue 10 months ago • 5 comments

My pipeline was working fine for a month until the data volume reaches to like 50 million of rows. It then throws an error lilke this:

what(): {"exception_type":"INTERNAL","exception_message":"Attempted to access index 9223372036854775807 within vector of size 3"}

I first of all thought this error could be beacuse of the duckdb side but no.

There is no error when I am running the model with a full-refresh mode. But as soon as I run an incremental model I am encountered with such kind of error.

g-diwakar avatar Feb 12 '25 14:02 g-diwakar

@g-diwakar , do you still experience this error on the latest version of DuckDB?

guenp avatar Apr 19 '25 00:04 guenp

Hi, I'm interested in contributing to this issue as my open-source contribution. I’ve tested a similar pipeline and was able to reproduce the error when running incremental models with a large dataset.

I'd like to help by:

  • Trying to create a minimal reproducible example
  • Exploring possible causes in the incremental model or vector indexing code
  • Testing and suggesting potential fixes

Please let me know if I can take this up or if there’s any guidance on where to start in the codebase. Thanks!

dineshcsdev avatar May 06 '25 08:05 dineshcsdev

Minimal reproducible example is always the best place to start-- thanks @Dinesh-0813 !

jwills avatar May 06 '25 14:05 jwills

can i work on this or not !

dineshcsdev avatar May 06 '25 14:05 dineshcsdev

@Dinesh-0813 yes you can!! Please let us help you though, if you can make a minimal repro we can help point you in the right direction

guenp avatar May 08 '25 16:05 guenp