aquadoggo
aquadoggo copied to clipboard
Simplify/improve how we store and retrieve document views
Currently we insert a new document view every time a document is materialised to a new state. The row for a document view contains references to the actual values for each field, so this isn't terrible in terms of storage space and if we want to be able to seek back in a documents history it's very cool. But there are some overheads that come with this approach and maybe there is a different approach which would suit our current needs just as well.
The problems I see with the current setup:
- which document views are persisted is a little haphazard, it isn't necessarily every view for a document, but only the ones which were materialised at some point
- of course we end up with many rows in the document views table which are likely not used and we will want to garbage collect these at some point
- the db tables and relations for documents and document views is quite complex, maybe too complex for what we want, I worry about new people trying to grok this
Suggestions:
1) Don't store document views at all
- remove document view table
- documents relate directly to their current field/values
- materialise document views on demand by rebuilding the document to the requested view
pros
- simplifies db tables
- no old views hanging around (no need for garbage collection)
- any view from a document can be queried, not only the ones which were materialised
cons
- querying a view would require that the document is rebuilt and so might get slow with large documents
2) Pin views we are interested in
- mostly the same as
1)
except we introduce the ability to "pin" a document view we are interested in - this would reintroduce the document views table, but only for pinned document views
pros
- only views that are actively pinned hang around, might still need garbage collection at some point but not for a while I suspect
- any view from a document can be queried and pinned, not only the ones which were materialised
cons
- reintroduces most of the db table complexity
I believe this is a missing feature we haven't implemented yet: only document views should be stored whenever there's a pinned relation to them or if they're the latest - aka the second scenario.
Pinning is an important feature and caching materialized views crucial so I think there's no way we can remove it. Surely there's ways to make it better but I don't see yet how.
Slightly related idea on complexity: from an angle of the p2panda specification and with a less column-based database thinking, maybe more graph-based database thinking it's not tooooo bad. It's actually quite intuitive as soon as one understands how all p2panda data relates to each other. Documentation, diagrams and architecture overviews could help as well, either in the code or an ARCHITECTURE.md.
Documentation, diagrams and architecture overviews could help as well
Something like this? :heart_eyes:
erDiagram
ENTRY |o--|| OPERATION : ""
ENTRY ||--|{ LOG : ""
ENTRY {
string public_key
string entry_bytes
string entry_hash
string log_id
string payload_bytes
string payload_hash
string seq_num
}
LOG {
string public_key
string document
string log_id
string schema
}
OPERATION ||--|{ DOCUMENT : ""
OPERATION {
string public_key
string document_id
string operation_id
string log_id
string schema_id
string previous
}
OPERATION_FIELD ||--|{ OPERATION : ""
OPERATION_FIELD {
string operation_id
string name
string field_type
string value
numeric list_index
}
DOCUMENT ||--|o DOCUMENT_VIEW : ""
DOCUMENT {
string document_id
string document_view_id
string schema_id
string is_deleted
}
DOCUMENT_VIEW {
string document_view_id
string schema_id
}
DOCUMENT_VIEW_FIELD ||--|{ DOCUMENT_VIEW : ""
DOCUMENT_VIEW_FIELD ||--|{ OPERATION_FIELD : ""
DOCUMENT_VIEW_FIELD {
string document_view
string operation_id
string name
}