h2database icon indicating copy to clipboard operation
h2database copied to clipboard

Store data and indices in two separate files

Open strelec opened this issue 2 years ago • 2 comments

I think we should split the file containing data and indices into separate files.

Benefits:

  1. You can always regenerate the index, but data, once lost, it's forever. Having just the data encapsulated would make recovery process easier.
  2. Less chance of data corruption. If you have a bug in your indexing code, that will not affect the data.
  3. Backups become much simpler. You can choose to copy just the much smaller data file, not the while thing.
  4. You can choose a more performant (and less safe) file system operation for just the index file. For example, no copy on write in btrfs.

strelec avatar Jul 03 '22 01:07 strelec

  1. Recovery tool doesn't use additional indexes.
  2. If implementation of some index writes incorrect data, it cannot affect content of other indexes in H2.
  3. Usually indexes aren't that large, but it depends on database schema.
  4. In that case there is still a risk of index corruption and following data corruption: 4.1. Indexes may become out of sync with the main data due to external reasons, actually it will be possible even if they are located on the same file system. 4.2. There is no fast way to check correctness of indexes, so such procedure cannot be performed on database startup. 4.3. It will be possible to violate constraints in the main data file due to differences in the index file. With corrupted indexes unexpected main data deletion is also possible, because search conditions may return different set of rows.

Commits, rollbacks, snapshots and other related operations will need to operate on two different storages, so a lot of additional complexity will be needed.

For example, no copy on write in btrfs.

H2 performs copy-on-write by itself. It isn't reasonable to use one CoW technology on top of another one. I don't know how exactly CoW works in btrfs, but I think you normally should disable it for database files.


Hypothetically for some highly fragmented indexes, such as indexes on UUID or other columns with random distribution of data, separate storage may reduce fragmentation of main storage, so I'm not completely opposed that idea, but it has significant drawbacks too.

katzyn avatar Jul 03 '22 02:07 katzyn

And the main drawback of multiple storages would be the fact that "dirty set" (data to be written) becomes partitioned too. So it is either longer periods between writes (and greater risk of corruption on abrupt termination), or writes with smaller chunks (increased write amplification). If anything, I would partition data on a basis of it's update frequency. @strelec, BTW what you call "data" is in fact just another index in H2.

andreitokar avatar Jul 04 '22 21:07 andreitokar