duckdb-web icon indicating copy to clipboard operation
duckdb-web copied to clipboard

Clarify disk spilling for memory connections

Open soerenwolfers opened this issue 1 year ago • 4 comments

https://duckdb.org/docs/guides/performance/how_to_tune_workloads

says

If DuckDB is running in in-memory mode, it cannot use disk to offload data if it does not fit into main memory. To enable offloading in the absence of a persistent database file, use the [SET temp_directory statement](https://duckdb.org/docs/configuration/pragmas#temp-directory-for-spilling-data-to-disk):

But according to https://duckdb.org/docs/configuration/overview.html

the temp_directory setting is set to a non-null ".tmp" value by default.

Could you clarify whether that means disk spilling is activated by default after all, and if not whether relative paths like ".tmp" are ignored in general?

soerenwolfers avatar May 12 '24 12:05 soerenwolfers

Related: https://github.com/duckdb/duckdb-web/issues/3058

soerenwolfers avatar Jun 15 '24 08:06 soerenwolfers

@szarnyasg What does the enhancement label mean here?

soerenwolfers avatar Jun 15 '24 08:06 soerenwolfers

I was looking for a "clarification" label but didn't find it, so I settled on enhancement. This is more of a bug in the documentation though, so I can bump its priority and take a look next week.

szarnyasg avatar Jun 15 '24 09:06 szarnyasg

Related issue that could be addressed in the documentation at the same time, but with opposite problem: https://github.com/duckdb/duckdb-web/issues/3058 -- docs say it defaults to 0 but probably zero is interpreted taken as "unlimited".

soerenwolfers avatar Jun 16 '24 18:06 soerenwolfers

@szarnyasg if you confirm how temp_directory and max_temp_directory_size behave I'd be happy to make the doc PRs

soerenwolfers avatar Aug 18 '24 11:08 soerenwolfers

Hi @soerenwolfers in these configuration options, the value 0 means unlimited. I created a limits page where this is explicitly stated. Further PRs are welcome – thanks in advance!

szarnyasg avatar Sep 03 '24 12:09 szarnyasg

@szarnyasg thanks for the clarification. and the default value '.tmp' does mean the temporary disk spilling directory is used even for in-memory connections, contrary to what the docs say?

soerenwolfers avatar Sep 03 '24 13:09 soerenwolfers

Right, we changed this recently (IIRC in 1.0). I added some clarifications via #3520. If there are more details to be clarified, please let me know or submit a PR. Thanks!

szarnyasg avatar Sep 03 '24 14:09 szarnyasg

Opened #3522 #3523 #3524 #3525 -- sorry for the many PRs; did this in lazy mode using the web interface

soerenwolfers avatar Sep 03 '24 15:09 soerenwolfers

Slightly off-topic, but the limits page says

| Memory allocation | 128 GB | - |

Not to brag, but I'm writing this from a machine with 1TB memory. Am I misunderstanding that limit?

soerenwolfers avatar Sep 03 '24 15:09 soerenwolfers

Opened https://github.com/duckdb/duckdb-web/pull/3522 https://github.com/duckdb/duckdb-web/pull/3523 https://github.com/duckdb/duckdb-web/pull/3524 https://github.com/duckdb/duckdb-web/pull/3525 -- sorry for the many PRs; did this in lazy mode using the web interface

Thanks, will review them now.

Am I misunderstanding that limit?

Fair point, this should be clearer. This is the limit for a single vector size: https://github.com/duckdb/duckdb/blob/1e883cd4d87d812166d035e180145a85c608ad6f/src/include/duckdb/common/constants.hpp#L60-L61

This is not something that 99.9% of users should be concerned with, but it's good to have in the docs. I'll clarify it a bit.

szarnyasg avatar Sep 03 '24 18:09 szarnyasg