duck-read-cache-fs icon indicating copy to clipboard operation
duck-read-cache-fs copied to clipboard

unable to read duckdb file

Open obarisk opened this issue 8 months ago • 4 comments

Describe the bug with cache_httpfs loaded, duckdb unable to read a duckdb file from http.

To Reproduce

FORCE INSTALL cache_httpfs FROM community;
LOAD cache_httpfs

ATTACH 's3://duckdb-blobs/databases/stations.duckdb' AS stations_db;

Expected behavior

we should get an attached database called stations_db

Screenshots

n/a

Desktop (please complete the following information):

  • OS: osx arm64 / linux amd64
  • duckdb: 1.2.2

obarisk avatar Apr 17 '25 06:04 obarisk

Thank you @obarisk for the report! Do you mind also pasting the error message to the issue description if it isn't too hard? I will take a look later.

dentiny avatar Apr 17 '25 07:04 dentiny

on linux

INTERNAL Error:
Attempted to dereference unique_ptr that is NULL!

Stack Trace:

/home/obarisk/.duckdb/extensions/v1.2.2/linux_amd64_gcc4/cache_httpfs.duckdb_extension(+0x58e0e3) [0x7fb0b478e0e3]
/home/obarisk/.duckdb/extensions/v1.2.2/linux_amd64_gcc4/cache_httpfs.duckdb_extension(+0x58e116) [0x7fb0b478e116]
/home/obarisk/.duckdb/extensions/v1.2.2/linux_amd64_gcc4/cache_httpfs.duckdb_extension(+0x58fdb1) [0x7fb0b478fdb1]
/home/obarisk/.duckdb/extensions/v1.2.2/linux_amd64_gcc4/cache_httpfs.duckdb_extension(+0x35033e) [0x7fb0b455033e]
/home/obarisk/.duckdb/extensions/v1.2.2/linux_amd64_gcc4/cache_httpfs.duckdb_extension(+0x35656f) [0x7fb0b455656f]
/home/obarisk/.duckdb/extensions/v1.2.2/linux_amd64_gcc4/cache_httpfs.duckdb_extension(+0x359b41) [0x7fb0b4559b41]
/home/obarisk/.duckdb/extensions/v1.2.2/linux_amd64_gcc4/cache_httpfs.duckdb_extension(+0x359d58) [0x7fb0b4559d58]
duckdb() [0x92fc74]
duckdb() [0xb82146]
duckdb() [0xd31f63]
duckdb() [0xd326a4]
duckdb() [0x14acb85]
duckdb() [0xbb8914]
duckdb() [0xbc3dbb]
duckdb() [0xbc40c8]
duckdb() [0xbba6be]
duckdb() [0xbbde84]
duckdb() [0xb78463]
duckdb() [0xb786b8]
duckdb() [0xb78845]
duckdb() [0x73dfbf]
duckdb() [0x72305d]
duckdb() [0x723606]
duckdb() [0x72411c]
duckdb() [0x724797]
duckdb() [0x716a13]
/lib/x86_64-linux-gnu/libc.so.6(+0x29ca8) [0x7fb0f4743ca8]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7fb0f4743d65]
duckdb() [0x71a2d7]

This error signals an assertion failure within DuckDB. This usually occurs due to unexpected conditions or errors in the program's logic.
For more information, see https://duckdb.org/docs/dev/internal_errors

obarisk avatar Apr 17 '25 07:04 obarisk

on osx. it's more strange.

we need

load httpfs;
load cache_httpfs;

ATTACH 's3://duckdb-blobs/databases/stations.duckdb' AS stations_db;

to get the following error

INTERNAL Error:
Attempted to dereference unique_ptr that is NULL!

Stack Trace:

0        _ZN6duckdb9ExceptionC2ENS_13ExceptionTypeERKNSt3__112basic_stringIcNS2_11char_traitsIcEENS2_9allocatorIcEEEE + 64
1        _ZN6duckdb17InternalExceptionC1ERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE + 20
2        _ZNK6duckdb10unique_ptrINS_10FileHandleENSt3__114default_deleteIS1_EELb1EEptEv + 132
3        _ZN6duckdb21CacheFileSystemHandleC2ENS_10unique_ptrINS_10FileHandleENSt3__114default_deleteIS2_EELb1EEERNS_15CacheFileSystemE + 44
4        _ZN6duckdb15CacheFileSystem28GetOrCreateFileHandleForReadERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEENS_13FileOpenFlagsENS_12optional_ptrINS_10FileOpenerELb1EEE + 668
5        duckdb::VirtualFileSystem::OpenFile(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, duckdb::FileOpenFlags, duckdb::optional_ptr<duckdb::FileOpener, true>) + 452
6        duckdb::WriteAheadLog::Replay(duckdb::FileSystem&, duckdb::AttachedDatabase&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 72
7        duckdb::SingleFileStorageManager::LoadDatabase(duckdb::StorageOptions) + 636
8        duckdb::StorageManager::Initialize(duckdb::StorageOptions) + 80
9        duckdb::AttachedDatabase::Initialize(duckdb::StorageOptions) + 104
10       duckdb::PhysicalAttach::GetData(duckdb::ExecutionContext&, duckdb::DataChunk&, duckdb::OperatorSourceInput&) const + 600
11       duckdb::PipelineExecutor::FetchFromSource(duckdb::DataChunk&) + 124
12       duckdb::PipelineExecutor::Execute(unsigned long long) + 236
13       duckdb::PipelineTask::ExecuteTask(duckdb::TaskExecutionMode) + 236
14       duckdb::ExecutorTask::Execute(duckdb::TaskExecutionMode) + 192
15       duckdb::Executor::ExecuteTask(bool) + 252
16       duckdb::ClientContext::ExecuteTaskInternal(duckdb::ClientContextLock&, duckdb::BaseQueryResult&, bool) + 64
17       duckdb::PendingQueryResult::ExecuteInternal(duckdb::ClientContextLock&) + 60
18       duckdb::PendingQueryResult::Execute() + 56
19       duckdb_shell_sqlite3_print_duckbox + 368
20       duckdb_shell::ShellState::ExecutePreparedStatement(sqlite3_stmt*) + 932
21       duckdb_shell::ShellState::ExecuteSQL(char const*, char**) + 452
22       duckdb_shell::ShellState::RunOneSqlLine(char*) + 104
23       duckdb_shell::ShellState::ProcessInput() + 916
24       main + 3140
25       start + 6000

This error signals an assertion failure within DuckDB. This usually occurs due to unexpected conditions or errors in the program's logic.
For more information, see https://duckdb.org/docs/dev/internal_errors

obarisk avatar Apr 17 '25 08:04 obarisk

Hi @obarisk , unfortunately I cannot reproduce the nullptr dereference issue; let's solve the other attachment issue first :)

dentiny avatar Apr 18 '25 10:04 dentiny

If I attach to my database (remote duckdb file on S3) before load cache_httpfs, I'm able to make some progress, however I soon get a segfault:

Fatal Python error: Segmentation fault

Thread 0x00000001f4c04800 (most recent call first):
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/ducklake.py", line 635 in test_s3_ducklake_metadata
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/_pytest/python.py", line 157 in pytest_pyfunc_call
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/_pytest/python.py", line 1671 in runtest
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 178 in pytest_runtest_call
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 246 in <lambda>
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 344 in from_call
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 245 in call_and_report
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 136 in runtestprotocol
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 117 in pytest_runtest_protocol
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/_pytest/main.py", line 367 in pytest_runtestloop
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/_pytest/main.py", line 343 in _main
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/_pytest/main.py", line 289 in wrap_session
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/_pytest/main.py", line 336 in pytest_cmdline_main
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/_pytest/config/__init__.py", line 175 in main
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/lib/python3.11/site-packages/_pytest/config/__init__.py", line 201 in console_main
  File "/Users/erik/DropboxMaestral/home/git/duckdb-test/.venv/bin/pytest", line 10 in <module>

Extension modules: psutil._psutil_osx, psutil._psutil_posix, charset_normalizer.md, google._upb._message (total: 4)

MacOs 26.0.1 Duckdb 1.4.1

erikcw avatar Oct 15 '25 17:10 erikcw

Hi @erikcw thanks for the report!

  • I'm wondering if you could file a separate issue? It's different with the initial one
  • Could you please also provide a (minimal) script to repro? That would be beneficial for me to debug, thank you!

dentiny avatar Oct 15 '25 17:10 dentiny

@dentiny

Hello, I encountered the same problem that cache_httpfs causes crash when ATTACH-ing remote .duckdb which is approximately 3.8G and has mutiple tables over S3 (unique_ptr NULL) after LOAD cache_httpfs.

Summary

When cache_httpfs is loaded (on-disk mode), ATTACH-ing a remote single-file DuckDB database over S3 crashes with: INTERNAL Error: Attempted to dereference unique_ptr that is NULL!

This happens at ATTACH-time (before any SELECT), only if we ATTACH s3://…/*.duckdb after LOAD cache_httpfs.
If we (a) ATTACH a local path, or (b) DO NOT load cache_httpfs, the same code works.

Minimal Repro (Python)

import duckdb

con = duckdb.connect(database=":memory:")
con.execute("INSTALL httpfs; LOAD httpfs;")
# crash only when cache_httpfs is loaded before ATTACH
con.execute("INSTALL cache_httpfs FROM community; LOAD cache_httpfs;")

# on-disk cache config
con.execute("SET cache_httpfs_type='on_disk';")
con.execute("SET cache_httpfs_cache_directory='/tmp/duck_cache';")

# S3 auth (works, HEAD returns 200)
con.execute("SET s3_region='ap-northeast-1';")
con.execute("SET s3_url_style='path';")  # also tried 'vhost'
con.execute("SET s3_use_ssl=true;")
con.execute("SET enable_http_logging=true;")

# Repro: ATTACH a .duckdb file on S3 (HEAD 200, GET range returns 206)
con.execute("ATTACH 's3://<bucket>/<prefix>/2025/10/xxxx_2025-10-15.duckdb' AS d20251015 (READ_ONLY)")

Expected

ATTACH succeeds; later queries fetch needed blocks via HTTP range requests and (optionally) cache on disk.

Actual

Crash with internal error from cache_httpfs extension:

INTERNAL Error: Attempted to dereference unique_ptr that is NULL!

Stack Trace: .../cache_httpfs.duckdb_extension FileSystem::OpenFile -> WriteAheadLog::Replay -> SingleFileStorageManager::LoadDatabase -> PhysicalAttach::GetData ...

HTTP log (abridged)

We see ATTACH triggers:

HEAD https://s3.ap-northeast-1.amazonaws.com//.duckdb → 200 OK

GET Range: bytes=0-524287 → 206 Partial Content Then the crash.

Notes & Workarounds

If ATTACH a local path, no crash.

If we do not LOAD cache_httpfs, ATTACH s3://...duckdb also works.

If we ATTACH s3://...duckdb before loading cache_httpfs, it avoids the crash but then .duckdb file handles won’t benefit from cache_httpfs.

Version:

DuckDB core: 1.4.1 (linux_amd64) via Python 3.11

cache_httpfs: matching 1.4.1 (community extension)

S3: Amazon S3, region ap-northeast-1, server-side encryption AES256 enabled.

Environment

Docker linux/amd64, Python 3.11

DuckDB 1.4.1

cache_httpfs from community extensions 1.4.1

OS: Debian-based container

Reproducible consistently

Ask

Confirm whether cache_httpfs currently supports ATTACH-ing remote .duckdb files.

If yes, this crash looks like a null pointer deref in the extension; pointers on a fix or a known-good version would be appreciated.

If not supported yet, can we get a guard/clear error message instead of a crash?

Thanks!

RyuuSetsuhi avatar Oct 20 '25 05:10 RyuuSetsuhi

@RyuuSetsuhi / @erikcw I put a fix here: https://github.com/dentiny/duck-read-cache-fs/pull/291 Thanks for reporting and wait!

dentiny avatar Oct 20 '25 08:10 dentiny

@erikcw FYI, the segfault comes from

  • When you load a database by attaching a remote database file, it reads not only database file, but also attempts to read other files, like WAL; which might not exist
  • An implementation detail: In cache filesystem, it logs and caches file handle returned by http filesystem, which is NULL in this case, thus segfault

dentiny avatar Oct 20 '25 08:10 dentiny

Hi @RyuuSetsuhi, also a tip for the extension usage.

The extension supports exclusion list, which allows users and applications to not cache certain files. I think duckdb database file might be a good example: (1) it's likely loaded only once; (2) it's large and accounts for large memory or disk space.

Example usage could reference to https://github.com/dentiny/duck-read-cache-fs/blob/615f4f597a2285b87b5d7c1420a750c89d9a4c42/test/sql/cache_exclusion.test#L15-L16

dentiny avatar Oct 20 '25 08:10 dentiny

I will close the issue for now, feel free to re-open or create a new issue if it's still an issue. Thank you all!

dentiny avatar Oct 24 '25 09:10 dentiny