duckdb_engine
duckdb_engine copied to clipboard
[Feature Request] Blacklist certain table functions to reduce security risks
We would like to run DuckDB in a context in which we are giving users the ability to run SQL queries, but we also don't want them to be able to read arbitrary data from the file system which risks leaking credentials / secrets.
Our specific context is Apache Superset. Preset (the commercial superset offering) have an article where they mention:
While Apache Superset has supported DuckDB since last year, the embedded database was still not available on Preset for security considerations, since it could potentially have access to the filesystem of the workspace. This changed with the release of MotherDuck, though, which allows users to run queries on the cloud.
Unfortunately MotherDuck is not an option for us so we are considering alternatives.
Would you be interested in supporting the ability to filter statements that contain blacklisted "table_functions"?
Hey there!
I don't think this is a feature that can realistically live outside of DuckDB's core unfortunately. This query explains why pretty well:
SELECT * FROM 'passwords.csv'
with the "replacement scans" that DuckDB supports, this is translated at runtime into a call to read_csv_auto
, and that is done after the parser as well (otherwise you could use the AST load/dump support). I imagine MotherDuck avoids most of the issues here by plugging into DuckDB's C++ core, which isn't really something suitable for every use case. That being said, there is a enable_external_access
configuration parameter, but obviously that's a pretty coarse control.
Hope this helps!
Thanks for your quick and clear response!
If we find a nice solution I will update this issue with our findings.
We don't have a really nice solution, but I thought I'd share where we have ended up... We think the following may be "good enough" for now.
WARNING: We haven't done an extensive security review and can't really be sure this is sufficient, use at your own risk!
Add a custom superset SQL_QUERY_MUTATOR which locks down database connections
def SQL_QUERY_MUTATOR(sql: str, database, **kwargs) -> str:
with database.get_raw_connection() as conn:
cursor = conn.cursor()
try:
database.db_engine_spec.execute(
cursor,
"""
-- Load any extensions you need here for example:.
INSTALL postgres_scanner;
LOAD postgres_scanner;
-- Disable filesystem, external access, etc. for example:.
SET disabled_filesystems='LocalFileSystem';
SET enable_external_access=false;
-- Lock the configuration so that it configuration above can't be reset
SET lock_configuration=true;
"""
)
cursor.commit()
# Catch the exceptions you expect to occur when you attempt to configure a locked down connection:
except (PermissionException, InvalidInputException) as e:
# The connection is already locked and therefore configured
pass
except Exception as e:
# Something un-expected happened, don't run the SQL
raise e
return sql
TLDR: The superset SQL_QUERY_MUTATOR configuration gives you an opportunity to lock down database connections
Edit: feel free to close!