Must throw exception if eid's are too big
I was experimenting with external ids by using the second approach suggested by @tonsky at https://github.com/tonsky/datascript/issues/31#issuecomment-57993865 and found an undocumented max entity-id limit exists at 0x1FFFFFFF but surpassing it gets unnoticed. Check the following code and his output:
(doseq [base-id [800 0x1FFFFFFF (inc 0x1FFFFFFF) 0xFFFFFFFFFFFFFFFF]]
(let [db (data/create-conn {})
{:keys [tempids]} (data/transact! db [{:db/id base-id
:name "Foo"}
{:db/id -1
:name "Bar"}
{:db/id -2
:name "Baz"}
])]
(println "Using base-id = " base-id)
(println " => tempids = " tempids)
(println " => entity = " (-> @db (data/entity base-id) data/touch))
(println " => :max-eid = " (:max-eid @db))
(println)
))
Output:
Using base-id = 800
=> tempids = {-1 801, -2 802, :db/current-tx 536870913}
=> entity = {:name Foo, :db/id 800}
=> :max-eid = 802
Using base-id = 536870911
=> tempids = {-1 536870912, -2 536870912, :db/current-tx 536870913}
=> entity = {:name Foo, :db/id 536870911}
=> :max-eid = 536870911
Using base-id = 536870912
=> tempids = {-1 1, -2 2, :db/current-tx 536870913}
=> entity = {:name Foo, :db/id 536870912}
=> :max-eid = 2
Using base-id = 18446744073709552000
=> tempids = {-1 1, -2 2, :db/current-tx 536870913}
=> entity = {:name Foo, :db/id 18446744073709552000}
=> :max-eid = 2
Notes and questions:
1- When using base-id = 536870911 (0x1FFFFFFF) datascript can't assign new temporal ids for the next entities, they all get 536870912 (0x1FFFFFFF + 1). I think an exception must be raised.
2- When using base-id >= (0x1FFFFFFF + 1) the database :max-eid is not updated, but you can sucesfully retrieve the created entity with the big id. Is ok and safe to use big ids greater than 0x1FFFFFFF + 1?
Yeah, this one is subtle and not properly documented.
Same as Datomic, DataScript has notion of partitions. There’s no public interface to it, but internally there’re 2 partitions: entity ids [0–0x20000000) and transaction ids [0x20000000–∞). When you create new entity, its id gets allocated sequentially from first range, and each transaction gets id sequentially from second range (see datascript.core/tx0 as a basis for transact-id partition).
Now, when you assign entity id manually, DataScript has to account for that. When you add new entity with manually assigned id X, database’s value of max-eid gets advanced to max(old max-eid, X). But it’s also perfectly correct to add facts about transactions. E.g. if you add [e a v] where e is from transact-id partition, it should refer to existing transaction id and should not advance max-eid of entity id partition. (I’m not sure it works as I described at the moment, but it should, and eventually it will).
Answering your questions:
- Yes, when exceeding entity-ids exception should be thrown.
- It is safe, but notice that such ids will intersect with transaction ids. Usually there’s nothing wrong with that.
Value 0x2000 0000 was not chosen at random. JS engines like V8 use much more efficient unboxed representation for signed 31 bit integers. This leaves us interval of [0–0x3FFF FFFF] for positive ids. I just split it in half, allocated first half for entity ids and second half for transaction ids. Datomic uses 64bit signed longs for entity ids and much wider range per partition, unfortunately, we have no chance to reproduce that efficiently in JS.
Thanks for the clear and detailed explanation. Just for the record, it will be very nice to have also a string-based eids partition. I think string eids can be very helpful in the mapping scenario as described in #31, for example, to use external database UUID's as entity-ids (maybe sacrificing a little performance to make queries more elegant).
Well, separate attribute for external ids are still a viable option. With lookup refs it’s quite easy to use now. I prefer it that way: eids are fast, but when you need more rich referencing capabilities, you use external ids and pay some price for resolving them.