cantata
cantata copied to clipboard
SQL and database abstraction for Clojure

Cantata
SQL and database abstraction for Clojure:
- Relationship-aware querying and saving
- Queries made of data
- Legacy-friendly: custom name mappings, composite primary keys
- No implicit global state
- Extensibile
Installation
Use Leiningen and add this to your project's dependencies:
[cantata "0.1.17"]
THIS LIBRARY IS STILL BAKING. ITS API COULD CHANGE AT ANY TIME. IT MAY ALSO BE DISCONTINUED IF I DECIDE IT'S A BAD APPROACH.
Crash Course
See the Quick Reference for a more systematic breakdown.
Namespace
Most of the API is exposed through the cantata.core
namespace:
(ns example.core
(:require [cantata.core :as c]))
Setup
To use Cantata, you need a data source and a data model. A data source is where your data comes from and gets stored to: a database. A data model describes the entities in a system and the relationships among them.
Let's pretend we have a schema like this already set up in a database:
To get up and running quickly, you can let Cantata work out most of the data model itself using reflection:
;; Any clojure.java.jdbc compatible DB spec
(def mysql-spec "mysql://localhost/film_store")
;; Shortcuts to supplement the reflected model
(def model
{:film {:shortcuts {:actor :film-actor.actor
:category :film-category.category
:renter :inventory.rental.customer}}
:customer {:shortcuts {:country-name :address.city.country.country}}})
;; When wrapped in a delay, reflection will happen at runtime
(def ds (delay (c/data-source
mysql-spec model
:reflect true)))
Note that Cantata does not create database tables or do migrations. We're merely hooking into a schema that has been created elsewhere.
Querying
Cantata can leverage the data model to perform queries that smartly combine results from any number of related tables, even when many-to-many relationships are involved.
The following query fetches the film with id 1, plus related language and actor data -- all in one database round trip, and nicely nested:
(c/query ds {:from :film
:select [:id :title :release-year :language :actor]
:where [:= 1 :id]})
=> [{:id 1
:title "ACADEMY DINOSAUR"
:release-year 2006
:language {:name "English", :id 1}
:actor [{:name "PENELOPE GUINESS", :id 1}
{:name "CHRISTIAN GABLE", :id 10}
{:name "LUCILLE TRACY", :id 20}]}]
Queries are made of data, and can be amended on the fly with extra clauses or parameters:
(def kid-film {:from :film
:where [:and
[:in :rating ["G" "PG"]]
[:< :?low :length :?high]]})
(c/query ds [kid-film :select [:title :release-year] :limit 3]
:params {:low 90 :high 100}])
=> [{:title "ARMAGEDDON LOST" :release-year 2006}
{:title "BILL OTHERS" :release-year 2006}
{:title "BOUND CHEAPER" :release-year 2006}]
You can refer to related entities or fields anywhere in a query, and Cantata will work out the joins for you:
;; 8 joins against 9 tables - one round trip
(c/query ds
[:from :film
:select [:title :actor.name]
:where [:= "Canada" :renter.country-name]])
You can tell Cantata to fetch data from related tables in multiple database round trips if you prefer, by setting the :strategy
option to :multiple
. With this strategy, primary keys gathered during an initial, top-level query will be used to find data from related tables.
;; 3 database round trips - an extra for each to-many relationship
(c/query ds [:from :film
:select [:* :language :category :actor]
:limit 10]
:strategy :multiple)
Of course, when selecting data from one-to-many or many-to-many relationships, you should keep database impact in mind. Single round trips are best when you're selecting a small number of columns from the target table, because each top-level row gets repeated for each related row in the result set (setting the :flat
query option to true will show this). Multiple round trips prevent repetition of top-level rows but you pay for each trip. Cantata assumes you know what you're doing. Don't shoot your foot off!
Joins can always be made explicit, which overrides implicit joins of the same name:
(c/query
ds [:from :film
:select [:title :actor.name]
:where [:= 1 :id]
:join [[:film-actor :fa] [:= :id :fa.film-id]
:actor [:= :fa.actor-id :actor.id]]])
=> [{:title "ACADEMY DINOSAUR"
:actor [{:name "PENELOPE GUINESS"}
{:name "CHRISTIAN GABLE"}
{:name "LUCILLE TRACY"}
…]}]
If needed, you can drop down to plain SQL:
(c/query ds "select id, title from film limit 3")
Query DSL
Because queries are data structures, you can build them up yourself using the usual Clojure tools (see Query Format). However, Cantata also provides helper functions in the canata.dsl
namespace for building queries piecemeal:
(ns example.core
(:require [cantata.core :as c]
[cantata.dsl :refer [from select un-select where]]]))
;; ...
(c/query ds (-> (from :film)
(select :title :release-year)
(where {:rating "R" :length 120}))) ;sugar syntax
The default behavior for helper functions is to merge: e.g., (select q :length)
will add :length
to the :select
clause. There are variant helpers with replace-
and un-
prefixes to replace and remove elements from clauses, respectively.
Helper functions always return a vector query (which query functions will accept). To turn the query into a fully-merged map, use build
:
(c/build (-> (from :film)
(select :title :release-year :length)
(un-select :length)
(where {:rating "R"})
(where :or [:< 10 :%count.actor.id] [:= :language.name "French"])))
=> {:from :film
:select (:title :release-year),
:where [:and
[:= :rating "R"]
[:or [:< 10 :%count.actor.id] [:= :language.name "French"]]]}
The helper functions are by no means required; use them or not according to your taste.
Saving
Cantata can also leverage the data model when saving. Here we add a new film with relevant categories attached:
;; Returns the generated id
(c/save! film {:title "Lawrence of Arabia"
:language-id 1
:category [{:id 7} ;Drama
{:id 4} ;Classics
{:name "Epic"}]}) ;Doesn't exist - will be created
Saving the film record and all related records happens within a transaction.
If the primary key of a record is present, an update will be performed:
;; Update a film - affects only the fields provided
(c/save! film {:id 1001 :release-year 1962})
Values and Types
Cantata can use field type information (gathered during reflection or defined explicitly) to transform values as they flow into and out of a data source.
For example, maybe you prefer Joda dates to Java ones:
(def ds (delay (c/data-source
mysql-spec model
:reflect true
:joda-dates true)))
(c/queryf ds [:select :payment.payment-date :limit 3])
=> (#<DateTime 2005-06-15T22:02:53.000Z>
#<DateTime 2005-06-16T01:08:46.000Z>
#<DateTime 2005-06-16T19:18:57.000Z>)
Custom types can be defined by implementing Cantata multimethods and protocols.
State
You can use multiple data sources or data models simultaneously, in different configurations. They will not step on each other because Cantata has no global state. Any state Cantata does have is confined to the data source (database connection, pool, etc.). Data models are immutable.
Reasons Not to Use Cantata
- An extra layer of abstraction between you and the database
- Query result processing overhead (not too bad, but there's room for improvement)
- Executed SQL may not always be optimal
- Some queries may be awkward or impossible to express in the Cantata query format
- Hefty implementation (~5 KLOC)
Playground Project
A playground project, which uses a fully fleshed out version of the movie store schema, with lots of fake data, is available in the cantata-sample repo. The README has lots of example queries.
Quick Reference
All functions come from the cantata.core
namespace unless otherwise noted.
Data Model
A data model is created using the data-source
or make-data-model
functions, which take entity specs. Cantata transforms the entity specs you provide into a DataModel
record internally.
Entity specs can be either a map, with entity names as keys and maps describing the entity as values; or a collection of maps describing the entities.
Example (map format):
{:film {:fields [:id :title]
:shortcuts {:actor :film-actor.actor}}
:actor {:fields [:id :name]}
:film-actor {:fields [:film-id :actor-id]
:pk [:film-id :actor-id]
:rels [:film :actor]}}
Entity descriptor maps can contain the following keys:
:name - entity name, a keyword (optional for map format)
:pk - name(s) of primary key field(s); default: name of first field
:fields - collection of field specs; see below for format
:rels - optional collection of rel specs; see below for format
:shortcuts - optional map of shortcuts; see below for format
:validate - optional function to validate a entity values map; called
prior to inserting or updating; expected to return problem
map(s) on validation failure; see `problem`
:hooks - optional map of hooks; see below for format
:db-name - string name of corresponding table in SQL database;
default: entity name with dashes converted to underscores
:db-schema - optional string name of table schema in SQL database
Field specs can be keywords or maps. A keyword is equivalent to a map with
only the :name
key set. The following map keys are accepted:
:name - field name, a keyword
:type - optional data type; built-in types:
:int :str :boolean :double :decimal :bytes
:datetime :date :time
:db-name - string name of corresponding column in SQL database;
default: field name with dashes converted to underscores
:db-type - optional string type of corresponding column in SQL database
Relationship (rel) specs can be keywords or maps. A keyword is equivalent to
a map with only the :name
key set. The following map keys are accepted:
:name - rel name, a keyword
:ename - name of related entity; default: rel name
:key - name of foreign key field; default: rel name + "-id" suffix
:other-key - name of referenced key on related entity; default: primary
key of related entity
:reverse - boolean indicating that the foreign key column is on the other
table; default: false
:one - boolean indicating whether the relationship is one-to-one;
default: false for reverse rels, true otherwise
Reverse relationships will be automatically added to related entities
referenced in rel specs, using the name of the former entity as the rel name.
If there is more than one reverse relationship created with the same name,
each will be prefixed with an underscore and the name of the relationship
to ensure uniqueness, like so: :_rel-name.entity-name
.
Shortcuts take the form of a map of shortcut path to target path. Target paths can point to rels or fields.
Hooks are experimental and may change in future versions. They take the form of a map from hook name to hook function. Available hooks and their corresponding arguments and expected return values:
:before-query [ent expanded-q env added-paths] -> [expanded-q env added-paths]
:after-query [ent results] -> results
:validate [ent map] -> problems
:before-save [ent map] -> map
:after-save [ent map ret] -> ret
:before-insert [ent maps] -> maps
:after-insert [ent maps ret] -> ret
:before-update [ent map pred] -> [map pred]
:after-update [ent map pred ret] -> ret
:before-delete [ent pred] -> pred
:after-delete [ent pred ret] -> ret
Data Source
A data source map is created using the data-source
function. The map it returns will be compatible with both Cantata and clojure.java.jdbc.
data-source
takes two arguments: db-spec
and entity-specs
, plus keyword options.
db-spec - a clojure.java.jdbc-compatible spec
entity-specs - optional DataModel record or entity specs; when provided,
entity-specs will transformed into a DataModel; will be
merged with and take precedence over reflected entity specs
Keyword options (all default to false unless otherwise noted):
:reflect - generate a data model from the data source automatically;
can be used in combination with entity-specs, the latter
taking precedence
:pooled - create and return a pooled data source
:init-fn - function to initialize the data source; will be called
before reflection and data model creation; the data
source map will be passed as the argument
:joda-dates - return Joda dates for all queries
:clob-str - convert all CLOB values returned by queries to strings
:blob-bytes - convert all BLOB values returned by queries to byte
arrays
:ordered-maps - return ordered hash maps for all queries
:table-prefix - optional table prefix to remove from reflected table names
:column-prefix - optional column prefix to remove from reflected column
names
:quoting - identifier quoting style to use; auto-detects if
left unspecified; set to nil to turn off quoting (this
will break many queries); :ansi, :mysql, or :sqlserver
:query-cache - an atom that wraps a map-like object, such as a cache
from clojure.core.cache; used to cache prepared queries;
default a 32-element LRU cache
:hooks - data source-wide hooks; see `make-data-model` for
available hooks and format
:max-idle - max pool idle time in seconds; default 30 mins
:max-idle-excess - max pool idle time for excess connections, in seconds;
default 3 hours
Wrap your data-source call in delay
to prevent reflection or pool creation
from happening at compile time. Cantata will call force
on the delay when
it's used.
Query Format
A query can be a map, or a vector of zero or more maps followed by zero or more keyword-value clauses. For example:
[{:from :film} :select [:title :actor.name] :limit 1]
Any maps or clauses after the first will be merged according to the semantics of the clause. The build
function can be used to turn a vector query into a map. For example:
(c/build [{:from :film :select :id :where [:= "R" :rating]}
:select :title :where [:< 90 :length]])
=> {:from :film
:select [:id :title]
:where [:and [:= "R" :rating] [:< 90 :length]]}
Any paths to related entities referenced outside of :with
will trigger outer
joins when the query is executed. Use the :with
clause to trigger an inner
join instead.
Supported clauses:
:from - name of the entity to query; if not provided, will be
inferred from :select
:select - wildcards or paths of fields, relationships, or aggregates to
select; unlike SQL, unqualified names will be assumed to refer
to the :from entity
:where - predicate to narrow the result set; see below for format
:order-by - field names to sort results by; e.g., :title or
[[:title :desc] :release-year]
:limit - integer that limits the number of results
:offset - integer offset into result set
:group-by - fields to group results by; forbidden for certain multi-queries
:having - like :where but performed after :group-by
:modifiers - one or more keyword modifiers:
:distinct - return distinct results
:include - one or more relationship names to perform a left outer join
with. May also be a map of the form:
{:rel-name [:foo :bar :baz]}, to select specific fields from
related entities.
:with - like :include but performs an inner join
:without - return results that have no related entity records for the
provided relationship names
:join - explicit inner join; e.g., [[:foo :f] [:= :id :f.id]]
:left-join - explicit left outer join
:options - a map with the following optional keys:
:join-type - whether to perform an :outer (the default) or
:inner join for fields selected from related
entities
Where applicable, there are variants of each clause with a :replace- or :un- prefix. For example, :replace-select will replace, rather than merge, the :select clause; and :un-select will remove fields from :select.
Predicates are vectors of the form [op arg1 arg2 ...], where args are paths, other predicates, etc. Built-in ops:
:and :or :xor :not
:= :not= :< :<= :> :>=
:in :not-in :like :not-like :between
:+ :- :* :/ :% :mod :| :& :^
Example predicate: [:and [:= "Drama" :category.name] [:< 90 :length 180]]
Aggregates are keywords that begin with % - e.g., :%count.actor.id
Bindable parameters are denoted with a leading ? - e.g., :?actor-name
query
Function
Arguments: [ds q & opts]
Executes a query against data source. The query can be one of the following:
- Query map/vector - see above for format
- PreparedQuery record - see
prepare-query
- SQL string
- clojure.java.jdbc-style [sql params] vector
By default, returns a sequence of maps, with nested maps and sequences for values selected from related entities. Example:
(query ds {:from :film :select [:title :actor.name] :where [:= 1 :id]})
=> [{:title "Lawrence of Arabia"
:actor [{:name "Peter O'Toole"} {:name "Omar Sharif"}]}]
NOTE: using the :limit
clause may truncate nested values from to-many
relationships. To limit your query to a single top-level entity record while
retrieving all related records, restrict the results using the :where
clause,
or set the :strategy
option to :multiple
.
Keyword options:
:strategy - fetching strategy to use:
:single - (default) fetches all data in a single round trip
:multiple - fetches data in multiple round trips, one for
each selected second-level path segment that is
part of a path that contains a to-many
relationship. Primary keys of the top-level
entity results are used to fetch the related
results. :limit and :where clauses apply only to
the top-level entity query.
:flat - do not nest results; results for the same primary key may be
returned multiple times if the query selects paths from any
to-many relationships
:vectors - return results as a vector of [cols rows], where cols is a
vector of column names, and rows is a sequence of vectors with
values for each column
:params - map of bindable param names to values
:force-pk - prevent Cantata from implicitly adding primary keys to the
the low-level database query when to-many relationships are
selected (which it does to make nesting more predictable and
consistent)
Other Query Functions
query1 [ds q & opts]
Like query
but returns the first result. If :strategy
is :multiple
, limits
the query to a single result; otherwise, does not limit the query, so it's
the responsbility of the caller to not query for more results than needed.
query-count [ds q & opts]
Returns the number of matching results for a query. By default, returns
the count of distinct top-level entity results. Set the :flat
option to true
to return the count of ALL rows, including to-many rows with redundant
top-level values.
(c/query-count ds [:from :film :without :rental])
=> 226
by-id [ds ename id & [q & opts]]
Fetches the entity record from the data source whose primary key value is
equal to id
. Query clauses from q
will be merged into the
generated query.
getf [results path]
Returns one or more nested field values from the given query result or
results, traversing into related results as necessary, according to a
dotted keyword path. path
and results
can be swapped as arguments.
The following calls are equivalent:
(getf :actor.name results)
(getf results :actor.name)
If invoked with one argument, returns a partial function.
getf1 [results path]
Returns the same as getf
except if the result would be a sequence, in
which case it returns the first element.
queryf [ds q & opts]
Same as query
, but additionally calls getf
using the first selected path.
Example:
(queryf ds {:from :film :select :actor.name :where [:= 1 :id]})
=> ["Peter O'Toole" "Omar Sharif"]
queryf1 [ds q & opts]
Same as query
, but additionally calls getf1 using the first selected path.
Example:
(queryf1 ds {:from :film :select :actor.name :where [:= 1 :id]})
=> "Peter O'Toole"
prepare-query [ds q & opts]
Return a PreparedQuery
record, which contains ready-to-execute SQL and
other necessary meta data. When executed, the query will accept bindable
parameters. (Bindable paramters can be included in queries using keywords
like :?actor-name
.)
Unlike a JDBC PreparedStatement, a PreparedQuery record contains no connection-specific information and can be reused at any time.
(let [pq (c/prepare-query
ds [:from :film
:where [:= :?country :renter.country-name]])]
(c/query ds pq :params {:country "Canada"}))
to-sql [ds-or-dm q & opts]
Returns a clojure.java.jdbc-compatible [sql params] vector for the given query.
(c/to-sql ds [:select :film.actor])
=> ["SELECT \"actor\".\"id\" AS \"actor.id\", \"actor\".\"name\" AS \"actor.name\" FROM \"PUBLIC\".\"film\" AS \"film\" LEFT JOIN \"PUBLIC\".\"film_actor\" AS \"film_actor\" ON \"film\".\"id\" = \"film_actor\".\"film_id\" LEFT JOIN \"PUBLIC\".\"actor\" AS \"actor\" ON \"film_actor\".\"actor_id\" = \"actor\".\"id\""]
call [fn-name & args]
Returns a SQL call object for use in queries. Prefer the :%count.id
syntax for aggregates.
raw [s]
Returns a raw SQL string fragment. Using this in queries could cause strange behavior.
Manipulation Functions
-
save! [ds ename values & opts]
-
insert! [ds ename map-or-maps & opts]
-
update! [ds ename values pred & opts]
-
delete! [ds ename pred]
-
delete-ids! [ds ename id-or-ids]
-
cascading-delete! [ds ename]
-
cascading-delete-ids! [ds ename id-or-ids]
-
merge-and-delete! [ds ename id-to-keep id-to-merge]
-
execute! [ds q]
Transactions
-
with-transaction [binding & body]
-
rollback! [ds]
-
unset-rollback! [ds]
-
rollback? [ds]
-
with-rollback [binding & body]
Utility Functions
-
parse [ds ename values]
-
validate! [ds ename values]
-
problem [keys-or-msg] [keys msg]
-
resolve-path [ds ename path]
Introspection
-
data-model [ds]
-
entities [ds]
-
entity [ds ename]
-
rels [ds ename]
-
rel [ds ename rname]
-
fields [ds ename]
-
field [ds ename fname]
-
field-names [ds ename]
Debugging
verbose [& body]
Prints all SQL queries
(c/verbose
(c/query ds [:from :film :select [:id :actor]
:strategy :multiple]))
with-debug [binding & body]
Prints all SQL, rolls back changes
(c/with-debug ds
(c/cascading-delete-ids! ds :film 1))
Extending
-
cantata.parse/parse-value [v type]
- multimethod for defining how a value of a certain type gets parsed when it comes out of the database or other source -
cantata.parse/marshal-value [v type]
- multimethod for defining how a value of a certain type gets marshalled before being sent to the database -
cantata.protocols
- namespace that contains protocols for parsing and marshalling built-in types - See the Extensibility section of the HoneySQL docs for information about adding custom query clauses
License
Copyright © 2013 Justin Kramer
Distributed under the Eclipse Public License