datascript
datascript copied to clipboard
Editable schema
I have an implementation of this in Datsync (here), and I'd really rather have it just be a function in datascript.core
if you're amenable. More or less, it creates a new database from the old datoms and the updated schema.
There's a question about what the should be sent to listeners when this happens. It could perhaps be a more or less empty transaction report with keys :schema-changes
, :new-schema
and :old-schema
, perhaps.
Happy to clean this up and PR if you like.
Replacing schema is trivial to do in user code. What I don't want to do is to provide schema altering fn that will do that and ignore all the issues that arise from the fact that data stored in DS might not suit new schema (unique values, ref type, arities). If I ever provide such fn, it should deal with all the issues (~the same way Datomic deals with them, checking constraints, raising errors, not allowing certain migrations, etc). I'm not against changing schema, but for now only user of DS library is responsible for all the inconsistencies. That's why I think such a fn should not be included in DS yet (but all the tools to build it yourself are there already)
Fair enough. In my case (datsync), the only schema changes that come through are ones that had already passed through Datomic, so they'd more or less been "vetted". But you're right that there shouldn't be surprises in something that's a core part of datascript, and it would be easy for someone not thinking about the consequences to goof. Here are my thoughts on these issues:
- ref vs other types: Datomic actually doesn't let you change
:db/valueType
. So we could scan the datoms and make sure no attributes are being set as reference attributes when they've already been used for other data (and vice versa). - uniqueness: Changing from unique to non-unique is pretty straight forward with Datomic, but the other direction is only allowed in certain cases (values are already unique and have an index, or there are no values). We could mimick that or simply disallow depending on how much work we're willing to put into it.
- cardinality: cardinality one to many is straight forward, but many to one requires deciding which value to keep. I think this may actually be straightforward; Under the new schema I think passing in the old datoms would deduplicate all but one of the values. If I'm wrong about that we could either disallow once there are multiple values, or filter all but the most recently asserted value.
Here are the relevant Datomic docs for reference: http://docs.datomic.com/schema.html#Schema-Alteration.
I get that since it's possible for users to deal with these issues themselves when they need it, this isn't a high priority in your book. However, I think for new users especially, it may not be obvious how might use init-db
and swap!
to do this themselves, let alone all the issues mentioned above. And I think this puts a damper on iterative/interactive development. With that in mind, if someone came up with an implementation that dealt with all these issues sufficiently, would you be open to including it in DataScript?
I think schema migration should work one of three possible ways:
- Always accept changes if they are safe (one to many, unique to non-unique)
- adjust internals if needed (index to no index and vice-versa)
- Validate changes if they're allowed but can be unsafe (many to one, non-unique to unique)
- throw if constraint is not satisfied by new schema
- Deny impossible changes (value type)
Important here is that DataScript will not change any user data as a result of schema alteration. That way user will have a chance to clean up data in the way that suits them, and DataScript will keep them safe by providing guarantees.
This would be an important piece of DataScript and I'll be happy to include it.
Agreed; Sounds good.
@tonsky I have an implementation of what you and @metasoarous describe in posh.sync.schema
(mainly in ensure-schema-changes-valid
of a Posh PR I'm working on). I'll PR it to DataScript once it's stable. Just thought I'd give you an opportunity to take a look.
Replacing schema is trivial to do in user code.
@tonsky did you mean the init-db
function?
If I want to add new attributes and don't touch already existing ones, then can I use a function like this?
;; just an example
(defn patch-schema [db patch]
(let [schema (:schema db)
new-schema (merge patch schema)
datoms (d/datoms db :eavt)]
(d/init-db datoms new-schema)))
The order of merge arguments is important. Probably it will be better to check conflicts there.
Seems fine, but merge
should be something like merge-with merge
and patch applied over schema. The validation will be very tricky, which is the main reason I left it unimplemented. If you know what you are doing is safe, a function like this should be fine
Updating existed database is very tricky and requires support for new features like tuples. But what if we just recreate a db from scratch?
(defn with-schema [db new-schema]
(-> (d/empty-db new-schema)
(with-meta (meta db))
(d/db-with (d/datoms db :eavt))))
(defn vary-schema [db f & args]
(with-schema db (apply f (d/schema db) args)))
Validation is done by d/db-with
.
This way, we can use reliable but not so fast functions because they recreate indices.
I use a DB as a value. So I can use prototypes:
(def blank-entity
(-> (d/empty-db {:error/entity {:db/valueType :db.type/ref}})
(d/db-with [{:db/id 1
:db/ident :root}])))
(def blank-user
(-> blank-entity
(vary-schema assoc :user/aka {:db/cardinality :db.cardinality/many})))
(def john (-> blank-user
(d/db-with [{:db/ident :root
:user/name "Maksim"
:user/aka ["Max Otto von Stierlitz", "Jack Ryan"]}])))
Can I send PR with these functions?
This is what I am thinking:
- This implementation is very inefficient
- Can be reproduced in user code trivially if needed
If at any point in the future we might introduce actual schema migrations, they will probably have different API that will allow for more efficient implementation (e.g. by providing schema deltas instead of entirely new schema). So I don’t want to be locked right now into this API