cl-duckdb icon indicating copy to clipboard operation
cl-duckdb copied to clipboard

Common Lisp CFFI wrapper around the DuckDB C API

  • cl-duckdb

#+begin_html

Build Status

#+end_html

Common Lisp [[https://cffi.common-lisp.dev/][CFFI]] wrapper around the [[https://duckdb.org/][DuckDB]] C API

** Dependencies

Currently the following Common Lisp implementations and operating systems are tested via [[https://github.com/ak-coram/cl-duckdb/blob/main/.github/workflows/CI.yml][CI]]. Android via Termux (ECL & SBCL) and some other BSDs are also known to work.

  • [[https://sbcl.org/][SBCL]] (Linux, FreeBSD, Windows, macOS, macOS on AArch64)
  • [[https://ccl.clozure.com/][CCL]] (Linux, macOS)
  • [[https://ecl.common-lisp.dev/][ECL]] (Linux, FreeBSD, macOS, macOS on AArch64), see the [[./ECL.org][ECL.org]] document for specifics

The following native libraries need to be installed in a location where CFFI can find them:

  • [[https://sourceware.org/libffi/][libffi]]
  • [[https://duckdb.org/][DuckDB]]

For example on Ubuntu or Debian (amd64):

#+begin_src sh sudo apt-get install libffi-dev unzip

Download libduckdb-linux-amd64.zip from the C/C++ section of https://duckdb.org/docs/installation/

sudo unzip ~/Downloads/libduckdb-linux-amd64.zip libduckdb.so -d /usr/lib/ #+end_src

** Installation

cl-duckdb can now be installed via [[https://www.quicklisp.org/][Quicklisp]] from the [[https://ultralisp.org/][Ultralisp]] distribution:

#+begin_src lisp ;; Install the ultralisp distribution if you don't have it already (ql-dist:install-dist "http://dist.ultralisp.org/" :prompt nil) ;; Load cl-duckdb (ql:quickload :duckdb) #+end_src

** Packages

  • DUCKDB (nicknamed DDB): provides the high-level API.
  • DUCKDB-API: contains the low-level bindings to the DuckDB C API.

** Usage

*** Connecting to a database

This library relies on the special variable DUCKDB:CONNECTION for a default database connection. Setting up a global default connection is recommmended for interactive REPL sessions:

#+begin_src lisp ;; Use an in-memory database as the default connection (ddb:initialize-default-connection) #+end_src

#+begin_src lisp ;; Use a persistent database as the default connection (ddb:initialize-default-connection :path "my_database.ddb") #+end_src

#+begin_src lisp ;; Clean up the default connection at the end of the session (duckdb:disconnect-default-connection) #+end_src

For manual connection management most functions requiring a database connection also accept a connection object as a keyword argument (see DUCKDB:OPEN-DATABASE and DUCKDB:CONNECT for creating one).

To dynamically bind and automatically clean up a default connection, refer to DUCKDB:WITH-DEFAULT-CONNECTION and DUCKDB:WITH-TRANSIENT-CONNECTION instead.

*** Basic example

#+begin_src lisp ;; Use an in-memory transient database (ddb:with-transient-connection ;; Create a new range table containing integers (ddb:run "CREATE TABLE range (i INTEGER PRIMARY KEY)" "CREATE SEQUENCE seq_range_i START 1")

;; Use a prepared statement to populate the table with a 1000 values
(ddb:with-statement (statement "INSERT INTO range VALUES (nextval('seq_range_i'))")
  (dotimes (_ 1000) (ddb:perform statement)))

;; Solve Project Euler Problem 9
(let* ((euler9-query (ddb:concat "SELECT a.i * b.i * c.i AS solution "
                                 "FROM range AS c "
                                 "JOIN range AS b ON b.i < c.i "
                                 "JOIN range AS a ON a.i < b.i "
                                 "WHERE a.i + b.i + c.i = ? "
                                 "AND a.i * a.i + b.i * b.i = c.i * c.i"))
       (parameters '(1000))
       (results (ddb:query euler9-query parameters)))
  (format t "PE9 Solution: ~a~%" (ddb:get-result results 'solution 0))))

#+end_src

*** Interactive example: query remote Parquet data

The DUCKDB:Q (short for QUERY) and DUCKDB:FQ (short for FORMAT-QUERY) functions are provided as shorthands for interactive REPL use:

#+begin_src lisp (ddb:initialize-default-connection) (ddb:q "INSTALL httpfs") ; => (("Success" . #())) (let ((url "https://github.com/apache/parquet-mr/raw/master/parquet-hadoop/src/test/resources/test-file-with-no-column-indexes-1.parquet")) (ddb:fq "SELECT * FROM read_parquet(?) WHERE id < 10" url)) ;; +----+------+--------------------------------+----------------------------------------+ ;; | id | name | location | phoneNumbers | ;; +----+------+--------------------------------+----------------------------------------+ ;; | 0 | p0 | NIL | ((phone ((number . 0) (kind . cell)))) | ;; | 1 | p1 | ((lon . 1.0d0) (lat . 2.0d0)) | ((phone ((number . 1) (kind . cell)))) | ;; | 2 | p2 | ((lon . 2.0d0) (lat)) | ((phone ((number . 2) (kind . cell)))) | ;; | 3 | p3 | NIL | ((phone ((number . 3) (kind . cell)))) | ;; | 4 | p4 | ((lon . 4.0d0) (lat . 8.0d0)) | ((phone ((number . 4) (kind . cell)))) | ;; | 5 | p5 | ((lon . 5.0d0) (lat)) | ((phone ((number . 5) (kind . cell)))) | ;; | 6 | p6 | NIL | ((phone ((number . 6) (kind . cell)))) | ;; | 7 | p7 | ((lon . 7.0d0) (lat . 14.0d0)) | ((phone ((number . 7) (kind . cell)))) | ;; | 8 | p8 | ((lon . 8.0d0) (lat)) | ((phone ((number . 8) (kind . cell)))) | ;; | 9 | p9 | NIL | ((phone ((number . 9) (kind . cell)))) | ;; +----+------+--------------------------------+----------------------------------------+ ;; => NIL #+end_src

*** Sparks

There's some support for plotting query results directly in the REPL via [[https://github.com/tkych/cl-spark][cl-spark]]:

#+begin_src lisp (ddb:initialize-default-connection) ; => #<DUCKDB::CONNECTION {1014081EF3}>

(ddb:bind-static-table 'numbers `(("x" . (,(loop :for i :from 0d0 :by 0.2 :below pi :collect i) :duckdb-double)))) ; => NIL

(ddb:spark-query "SELECT x, sin(x) AS y, cos(x) AS z FROM numbers" nil '(x y z)) ;; X ▁▁▁▂▂▃▃▄▄▅▅▆▆▇▇█ ;; Y ▁▂▃▄▆▆▇▇█▇▇▆▅▄▃▁ ;; Z █▇▇▇▆▆▅▅▄▃▃▂▁▁▁▁ ;; => NIL

(ddb:vspark-query "SELECT pow(2, x) AS y FROM numbers" nil nil 'y) ;; 1.0 4.5 8.0 ;; ˫-----------------------+------------------------˧ ;; ▏ ;; █▏ ;; ██▎ ;; ███▋ ;; █████▍ ;; ███████▏ ;; █████████▎ ;; ███████████▋ ;; ██████████████▌ ;; █████████████████▊ ;; █████████████████████▍ ;; █████████████████████████▋ ;; ██████████████████████████████▌ ;; ████████████████████████████████████▎ ;; ██████████████████████████████████████████▋ ;; ██████████████████████████████████████████████████ ;; => NIL

(ddb:vspark-query "SELECT round(x, 2)::text AS x, sqrt(x) AS y FROM numbers" nil 'x 'y) ;; 0.0 0.8660254 1.7320508 ;; ˫---------------------+----------------------˧ ;; 0.0 ▏ ;; 0.2 ███████████▉ ;; 0.4 ████████████████▊ ;; 0.6 ████████████████████▋ ;; 0.8 ███████████████████████▊ ;; 1.0 ██████████████████████████▌ ;; 1.2 █████████████████████████████▏ ;; 1.4 ███████████████████████████████▍ ;; 1.6 █████████████████████████████████▋ ;; 1.8 ███████████████████████████████████▋ ;; 2.0 █████████████████████████████████████▌ ;; 2.2 ███████████████████████████████████████▍ ;; 2.4 █████████████████████████████████████████▎ ;; 2.6 ██████████████████████████████████████████▊ ;; 2.8 ████████████████████████████████████████████▌ ;; 3.0 ██████████████████████████████████████████████ ;; => NIL #+end_src

*** Writing queries via SxQL

If you want to use a syntax based on s-expressions for your queries, then the SxQL library is an option:

#+begin_src lisp (ddb:initialize-default-connection)

;; Load SxQL (ql:quickload :sxql) (use-package :sxql)

;; Create a table (ddb:run (yield (create-table :numbers ((i :type 'integer :primary-key t)))))

;; Define utility function (defun query-sxql (q) (multiple-value-call #'ddb:query (yield q)))

;; Populate table with values (loop :for x :below 100 :do (query-sxql (insert-into :numbers (set= :i x))))

(query-sxql (select ((:as (:sum :i) :sum)) (from :numbers) (where (:even :i)))) ;; => (("sum" . #(4950))) #+end_src

Please refer to the [[https://github.com/fukamachi/sxql][SxQL documentation]] for more examples.

*** Appenders

[[https://duckdb.org/docs/data/appender][Appenders]] are one of the ways of loading bulk data into DuckDB. They append rows to a single table of a database:

#+begin_src lisp (ddb:initialize-default-connection) ; => #<DUCKDB::CONNECTION {100B1088F3}>

(ddb:run "CREATE TABLE roman_numerals (i INTEGER, value TEXT)") ; => NIL (ddb:with-appender (appender "roman_numerals") (loop :for i :from 1 :below 4999 :do (ddb:append-row appender (list i (format nil "~:@R" i))))) ; => NIL (ddb:get-result (ddb:query "SELECT * FROM roman_numerals WHERE i = 1848" nil) 'value 0) ; => "MDCCCXXXXVIII" #+end_src

*** Querying Lisp vectors and lists as table columns

Currently only the following types are supported (the values are currently copied into DuckDB data chunks internally). Using a combination of vectors and list for different columns is possible, but each column should have the same length. Tables using Lisp data structures are not bound to a single connection and work across different ones.

**** Specialized vectors

| Common Lisp type | DuckDB Type | |---------------------+-------------| | bit | BOOLEAN | | (unsigned-byte 8) | UTINYINT | | (unsigned-byte 16) | USMALLINT | | (unsigned-byte 32) | UINTEGER | | (unsigned-byte 64) | UBIGINT | | (unsigned-byte 128) | UHUGEINT | | (signed-byte 8) | TINYINT | | (signed-byte 16) | SMALLINT | | (signed-byte 32) | INTEGER | | (signed-byte 64) | BIGINT | | (signed-byte 128) | HUGEINT | | single-float | REAL | | double-float | DOUBLE |

**** Lists & unspecialized vectors

List columns or unspecialized vectors need to specify the DuckDB column type and can contain the following values:

  • Booleans (nil, t, :false, :true, :null)
  • Integers in range of the corresponding column type
  • Floating point numbers (single-float & double-float)
  • Strings
  • Date, time or datetime values
  • UUIDs
  • NIL values

**** Examples

#+begin_src lisp (ddb:initialize-default-connection) ; => #<DUCKDB::CONNECTION {10074E8BE3}>

;; Use vectors as columns in a query: (let ((indexes (make-array '(10) :element-type '(unsigned-byte 8) :initial-contents '(1 2 3 4 5 6 7 8 9 10))) (primes (make-array '(10) :element-type '(unsigned-byte 8) :initial-contents '(2 3 5 7 11 13 17 19 23 29)))) (ddb:with-static-table ('primes `((i . ,indexes) (p . ,primes))) (ddb:format-query "SELECT * FROM primes" nil))) ;; +----+----+ ;; | i | p | ;; +----+----+ ;; | 1 | 2 | ;; | 2 | 3 | ;; | 3 | 5 | ;; | 4 | 7 | ;; | 5 | 11 | ;; | 6 | 13 | ;; | 7 | 17 | ;; | 8 | 19 | ;; | 9 | 23 | ;; | 10 | 29 | ;; +----+----+ ;; => NIL

;; DuckDB column types always have to be specified for lists (NIL ;; values are converted to NULL): (ddb:with-static-table ('integers `((i . (,(loop :for i :below 1000 :if (evenp i) :collect i :else :collect nil) :duckdb-integer)))) (ddb:query (ddb:concat "SELECT sum(i) AS sum " ", COUNT(i) AS not_null_count " "FROM integers") nil)) ; => (("sum" . #(249500)) ("not_null_count" . #(500)))

(ddb:with-static-table ('lyrics `(("in the year" . (,(list (format nil "~R" 2525)) :duckdb-varchar)))) (ddb:query "SELECT * FROM lyrics" nil)) ;; => (("in the year" . #("two thousand five hundred twenty-five")))

;; If another table with the same name exists, you can use the ;; static_table table function directly: (ddb:run (ddb:concat "CREATE TABLE polysemy ("That you have but slumbered here, " "While these visions did appear" VARCHAR)")) (ddb:with-static-table ('polysemy `(("If we shadows have offended, Think but this, and all is mended:" . (() :duckdb-varchar)))) (ddb:query (ddb:concat "SELECT A., B. FROM static_table('polysemy') AS A " "JOIN polysemy AS B ON true") nil)) ;; => (("If we shadows have offended, Think but this, and all is mended:" . #()) ;; ("That you have but slumbered here, While these visions did appear" . #()))

(ddb:with-static-table ('bools `((v . ((nil t :false :true :null) :duckdb-boolean)))) (ddb:format-query "SELECT v, v IS NULL AS is_null FROM bools" nil)) ;; +-----+---------+ ;; | v | is_null | ;; +-----+---------+ ;; | NIL | NIL | ;; | T | NIL | ;; | NIL | NIL | ;; | T | NIL | ;; | NIL | T | ;; +-----+---------+ ;; => NIL

;; Static tables can be managed in the global scope using the ;; BIND-STATIC-TABLE, UNBIND-STATIC-TABLE and CLEAR-STATIC-TABLES ;; functions. Temporarily overriding a table definition via ;; WITH-STATIC-TABLE works as expected: (ddb:bind-static-table 'alphabet `((c . (("α" "β" "γ" "δ") :duckdb-varchar)))) ; => NIL

(labels ((get-characters () (loop :with results := (ddb:query "SELECT c FROM alphabet" nil) :for c :across (ddb:get-result results 'c) :collect c))) (ddb:with-static-table ('alphabet ((c . (("Ⴀ" "Ⴁ" "Ⴂ" "Ⴃ") :duckdb-varchar)))) (ddb:with-static-table ('alphabet ((c . (("𐌀" "𐌁" "𐌂" "𐌃" "𐌄") :duckdb-varchar)))) (format t "Etruscan: ~{~a~^, ~}~%" (get-characters))) (format t "Asomtavruli: ~{~a~^, ~}~%" (get-characters))) (format t "Greek: ~{~a~^, ~}~%" (get-characters))) ;; Etruscan: 𐌀, 𐌁, 𐌂, 𐌃, 𐌄 ;; Asomtavruli: Ⴀ, Ⴁ, Ⴂ, Ⴃ ;; Greek: α, β, γ, δ ;; => NIL

(ddb:unbind-static-table 'alphabet) ; => NIL (ddb:clear-static-tables) ; => NIL #+end_src

** Type & Value conversions

| DuckDB Type | Common Lisp Type | Note | |-----------------+------------------------------+-----------------------------------------------| | NULL | null | nil (or :null for param. binding) | | BOOLEAN | boolean | t, nil (or :true & :false for param. binding) | | VARCHAR | string | | | BLOB | (vector (unsigned-byte 8)) | | | REAL | single-float | | | DOUBLE | double-float | | | DECIMAL | ratio | Max width of 38 | | TINYINT | integer | | | UTINYINT | integer | | | SMALLINT | integer | | | USMALLINT | integer | | | INTEGER | integer | | | UINTEGER | integer | | | BIGINT | integer | | | UBIGINT | integer | | | HUGEINT | integer | | | UHUGEINT | integer | | | DATE | local-time:date | | | TIMESTAMP | local-time:timestamp | Microsecond precision | | TIME | local-time-duration:duration | Microsecond precision | | INTERVAL | periods:duration | Microsecond precision | | UUID | frugal-uuid:uuid | | | ENUM types | string | | | LIST types | list | | | STRUCT types | alist | | | UNION types | * | Maps to one of the member types (or nil) | | BIT (BITSTRING) | bit-vector | |

  • https://github.com/dlowe-net/local-time
  • https://github.com/enaeher/local-time-duration
  • https://github.com/jwiegley/periods
  • https://github.com/ak-coram/cl-frugal-uuid

*** NIL as boolean FALSE vs NIL as NULL / custom return value for SQL NULL

#+begin_src lisp (ddb:initialize-default-connection) ;; => #<DUCKDB::CONNECTION {101CAC0A73}>

;; The boolean TRUE and FALSE values are mapped to T and NIL ;; respectively in Lisp, but SQL NULL is also mapped to NIL causing ;; some ambiguity:

(ddb:query "SELECT TRUE AS x, FALSE AS y, NULL AS z" '()) ;; => (("x" . #(T)) ("y" . #(NIL)) ("z" . #(NIL)))

;; When necessary it's possible to differentiate between FALSE and ;; NULL by simply using the IS NULL logical operator:

(ddb:query "SELECT FALSE IS NOT NULL AS x, NULL IS NULL AS y" '()) ;; => (("x" . #(T)) ("y" . #(T)))

;; When binding parameter values, NIL is bound as FALSE when DuckDB ;; can determine that the parameter type is boolean and as NULL ;; otherwise. This means that simple cases like the following work as ;; expected:

(ddb:run "CREATE TABLE values (v BOOLEAN)" '("INSERT INTO values (v) VALUES (?)" (nil))) ; => NIL (ddb:query "SELECT v, v IS NOT NULL AS is_not_null FROM values" '()) ;; => (("v" . #(NIL)) ("is_not_null" . #(T)))

;; In some cases DuckDB doesn't determine parameter types based on the ;; query and NIL is bound as NULL even for boolean parameters:

(ddb:query "SELECT ?::boolean || '' IS NULL AS x" '(nil)) ;; => (("x" . #(T)))

;; To differentiate between FALSE and NULL unambiguously when binding ;; boolean parameters, the keywords :FALSE and :NULL can be used. In ;; the query below the first parameter type is not determined by ;; DuckDB, so NIL would be bound AS NULL as seen in the similar ;; example directly above. The second parameter type is correctly ;; identified as boolean, so NIL would be bound as FALSE in this case.

(ddb:query "SELECT ?::boolean || '' IS NOT NULL AS x, ?::boolean IS NULL AS y" '(:false :null)) ; => (("x" . #(T)) ("y" . #(T)))

;; For completeness the :TRUE keyword is also supported. When used as ;; a parameter value, it is equivalent to using T:

(ddb:query "SELECT ? = ? AS x" '(:true t)) ; => (("x" . #(T))) #+end_src

Alternatively the default return value for SQL NULL can be customized:

#+begin_src lisp (ddb:initialize-default-connection) ;; => #<DUCKDB::CONNECTION {10042C24C3}>

;; NIL is used by default (ddb:query "SELECT ? AS x" '(:null)) ; => (("x" . #(NIL)))

;; Represent SQL NULL values as :NULL in the result for only one call (ddb:query "SELECT ? AS x" '(:null) :sql-null-return-value :null) ; => (("x" . #(:NULL)))

;; Change the default value (setf ddb:sql-null-return-value :null) ; => :NULL (ddb:query "SELECT ? AS x" '(:null)) ; => (("x" . #(:NULL))) #+end_src

** Development setup

  • Install [[https://www.quicklisp.org/][Quicklisp]]
  • Clone this repository and add it as a local Quicklisp project, for example:

#+begin_src sh git clone [email protected]:ak-coram/cl-duckdb.git ~/Projects/cl-duckdb ln -s ~/Projects/cl-duckdb ~/quicklisp/local-projects/cl-duckdb #+end_src

  • Start your favored REPL (e.g. sbcl) and load the library using Quicklisp:

#+begin_src lisp (ql:quickload :duckdb) #+end_src

*** Running tests

  • Load the tests via Quicklisp:

#+begin_src lisp (ql:quickload :duckdb/test) #+end_src

  • Use [[https://asdf.common-lisp.dev/][ASDF]] or [[https://fiveam.common-lisp.dev/][FiveAM]] to run the tests:

#+begin_src lisp ;; Using ASDF: (asdf:test-system :duckdb) ;; Using FiveAM directly: (fiveam:run! :duckdb) #+end_src

*** Running benchmarks

  • Load the benchmarks via Quicklisp:

#+begin_src lisp (ql:quickload :duckdb/benchmark) #+end_src

  • Use [[https://asdf.common-lisp.dev/][ASDF]] or run the benchmarks directly:

#+begin_src lisp ;; Using ASDF: (asdf:test-system :duckdb/benchmark) ;; Running directly: (duckdb/benchmark:run-benchmarks) #+end_src

** Legal

  • Released under the MIT License, same as DuckDB.
  • [[https://developercertificate.org/][Developer Certificate of Origin]]
  • [[https://en.wikipedia.org/wiki/File:Pair_of_mandarin_ducks.jpg][Source]] for README photo