chai icon indicating copy to clipboard operation
chai copied to clipboard

Handle integer overflows for sequence numbers

Open tie opened this issue 3 years ago • 2 comments

We should do something about integer overflows for values retuned by engine NextSequence method. E.g. SQLite claims to randomly choose available sequence number on overflow (I’d prefer deterministic solution, although I’m not sure to what extent SQLite implementation is random) unless primary key is AUTOINCREMENT.

See

  • #250
  • #43
  • https://sqlite.org/autoinc.html
  • https://sqlite.org/fileformat2.html#seqtab

tie avatar Oct 25 '20 22:10 tie

I wonder if using []byte or big.Int/[]uint for sequence numbers would cause issues with performance (true for really big integers, but it’s not like we’d be able to handle them with current uint64s anyway). If not, I think we’d be able to keep auto incrementing until either storage (more likely) or computational (almost unlikely) resources are exhausted.

tie avatar Oct 25 '20 22:10 tie

I wonder if using []byte or big.Int/[]uint for sequence numbers would cause issues with performance

Currently, the three calls to NextSequence do exactly the same thing, they encode the value to an unsigned Varint.

However, whether we use that number as a docid or for AUTOINCREMENT, we are limited by the maximum size of Genji's INTEGER type. This means that even if NextSequence returns an uint64, half of that space won't be used.

The reason is that the pk() function is an expression which returns an INTEGER, and it can be used in other expressions to compute new values, whose type will be limited to whatever Genji supports:

-- this evaluates to an INTEGER unless the result
-- overflows, in which case it will evaluate to a DOUBLE
SELECT pk() + 1; 

We need to distinguish three things: The limit of NextSequence, the limit of docid, and the limits of Genji's INTEGER type

NextSequence

NextSequence is a utility function that is used internally, which means that it is not bound to Genji's type system. The fact that it is a 64-bit unsigned int is considered large enough for now, even if there is currently no overflow handling. Since we cannot control where it does start (which makes it a bad fit for AUTOINCREMENT) and that each store has its own counter, it is very unlikely that we'd ever reach the limit of uint64 before other problems would manifest themselves. (If that's the only issue we have at that scale, I'd be very happy). Currently, overflow handling is totally engine-specific. However, I'm open to suggestions.

docid overflow

A docid is the equivalent of SQLite's rowid, it's a 64 bit signed integer, returns the next available docid. It doesn't guarantee that it will be always increasing. The main difference is that the overflow is not handled at all, for the same reasons as above.

https://github.com/genjidb/genji/blob/07b6552288f8b47d048863eca5953e5b0e28486b/database/table.go#L400-L407

INTEGER overflow

Genji's INTEGER is a 64 bit signed integer. Expressions using integers will evaluate to a double if the result overflows

https://github.com/genjidb/genji/blob/dbc05dcd28f5cb057b3c9b2c8308d6030e9b316c/document/value.go#L421-L443

Proposal

NextSequence job is not to look back, it is supposed to be monotonically increasing. We should make sure all engines return the same error if we reach the max value.

Regarding docid, the fact that it is using NextSequence is implementation detail. What matters is that the next returned docid is not guaranteed to the user. It can be 1000 then 300 for the next insert. Also, a docid for a particular document can change. If it's not what the user wants, they should use their own primary key (which is a best practice). We shouldn't provide any other guarantees to the user. This means that once we implement the VACUUM statement, we will be able to reclaim unused docid space (and potentially stop using NextSequence()) Regarding the issue at hand, I think SQLite's solution is a sound solution, as we don't know which area of the docid space is empty and starting from the beginning would potentially take a very long time.

asdine avatar Oct 26 '20 08:10 asdine