aquadoggo icon indicating copy to clipboard operation
aquadoggo copied to clipboard

Simplify/improve how we store and retrieve document views

Open sandreae opened this issue 2 years ago • 2 comments

Currently we insert a new document view every time a document is materialised to a new state. The row for a document view contains references to the actual values for each field, so this isn't terrible in terms of storage space and if we want to be able to seek back in a documents history it's very cool. But there are some overheads that come with this approach and maybe there is a different approach which would suit our current needs just as well.

The problems I see with the current setup:

  • which document views are persisted is a little haphazard, it isn't necessarily every view for a document, but only the ones which were materialised at some point
  • of course we end up with many rows in the document views table which are likely not used and we will want to garbage collect these at some point
  • the db tables and relations for documents and document views is quite complex, maybe too complex for what we want, I worry about new people trying to grok this

Suggestions:

1) Don't store document views at all

  • remove document view table
  • documents relate directly to their current field/values
  • materialise document views on demand by rebuilding the document to the requested view

pros

  • simplifies db tables
  • no old views hanging around (no need for garbage collection)
  • any view from a document can be queried, not only the ones which were materialised

cons

  • querying a view would require that the document is rebuilt and so might get slow with large documents

2) Pin views we are interested in

  • mostly the same as 1) except we introduce the ability to "pin" a document view we are interested in
  • this would reintroduce the document views table, but only for pinned document views

pros

  • only views that are actively pinned hang around, might still need garbage collection at some point but not for a while I suspect
  • any view from a document can be queried and pinned, not only the ones which were materialised

cons

  • reintroduces most of the db table complexity

sandreae avatar Jan 04 '23 19:01 sandreae

I believe this is a missing feature we haven't implemented yet: only document views should be stored whenever there's a pinned relation to them or if they're the latest - aka the second scenario.

Pinning is an important feature and caching materialized views crucial so I think there's no way we can remove it. Surely there's ways to make it better but I don't see yet how.

Slightly related idea on complexity: from an angle of the p2panda specification and with a less column-based database thinking, maybe more graph-based database thinking it's not tooooo bad. It's actually quite intuitive as soon as one understands how all p2panda data relates to each other. Documentation, diagrams and architecture overviews could help as well, either in the code or an ARCHITECTURE.md.

adzialocha avatar Jan 04 '23 23:01 adzialocha

Documentation, diagrams and architecture overviews could help as well

Something like this? :heart_eyes:

erDiagram
    ENTRY |o--|| OPERATION : ""
    ENTRY ||--|{ LOG : ""
    ENTRY {
        string public_key
        string entry_bytes
        string entry_hash
        string log_id
        string payload_bytes
        string payload_hash
        string seq_num
    }
    LOG {
        string public_key
        string document
        string log_id
        string schema
    }
    OPERATION ||--|{ DOCUMENT : ""
    OPERATION {
        string public_key
        string document_id
        string operation_id
        string log_id
        string schema_id
        string previous
    }
    OPERATION_FIELD ||--|{ OPERATION : ""
    OPERATION_FIELD {
        string operation_id
        string name
        string field_type
        string value
        numeric list_index
    }
    DOCUMENT ||--|o DOCUMENT_VIEW : ""
    DOCUMENT {
        string document_id
        string document_view_id
        string schema_id
        string is_deleted
    }
    DOCUMENT_VIEW  {
        string document_view_id
        string schema_id
    }
    DOCUMENT_VIEW_FIELD ||--|{ DOCUMENT_VIEW : ""
    DOCUMENT_VIEW_FIELD ||--|{ OPERATION_FIELD : ""
    DOCUMENT_VIEW_FIELD {
        string document_view
        string operation_id
        string name
    }

sandreae avatar Jan 05 '23 17:01 sandreae