Gaëtan de Menten
Gaëtan de Menten
> how do you paginate the results? do you use `LazyFrame.__getitem__` multiple times and then `collect` each? if so i worry that that would involve doing repeated calculations I am...
> Now, what happens when you run `result[:2].collect()`, and then `result[2:4].collect()`? You may expect that Polars is running the UDF for the first two elements and then for the next...
> would an offset argument in LazyFrame.head suffice for you? Now that I think of it, I don't think it's a good idea because then the Narwhals API would no...
> That's right, it only happens if there are operations which block slice pushdown. But, if you're displaying a lazyframe provided by the user, then you have no control over...
> i'm keen to understand the use-case more Thanks a lot for taking that time, it's really appreciated. > say you want to support duckdb. in that case, showing `from...
> it might help to speak about this over a call to understand what to do? If you feel that helps, I am available all day tomorrow.
> i think it breaks even if the database isn't updated? Indeed but I assume what you see is because duckdb is multithreaded by default. I suppose it evaluates different...
Haha! I did not realize multiprocessing was possible in combination with chunking. I thought it was an exclusive or thing. I was a bit mislead by that sentence in the...
> Maybe you are right and there is still something to gain there, but it sounds complicated to me. As right now I do not have much time to dedicate...
> also, just out of curiosity, you said initially the file took 25 hours to complete. How long is it taking now? what is the chunksize and how many cores...