Discussion: Data Schema

Open hallahan opened this issue 13 years ago • 1 comments

As I am thinking about implementing functionality to the history widget, I am thinking that the story/journal schema of the data files may not be exactly what we want. Since I wasn't around when the current schema was devised, I would like to know some of the reasoning behind the story/journal design. The problem with the current design is that if we revert the view to a past state that has been recorded in the journal, we have to rebuild the story by replaying the events in the journal up to that selected state in the journal. Though this isn't that big of a deal, it is inefficient.

Maybe we could have a revision id for each item as well as it's own unique id. Then, we could have a hash table of revisions. In a given journal in the journal array, we could have an array of revision ids that can in turn be hashed to get a specific item in a specific state. This would allow us to easily build a story on the fly for any state in the journal.

With this approach, we will no longer have to put in an entire item object for each journal entry, just an array of revision ids.

Lastly, I am wondering why images are put inside the json file. Down the line, people will want to post large image files as well as other file types. We may not want the client to download large amounts of data unless the client specifically requests it.

It may not be time for these sort of changes, but I imagine many of you are thinking about this.

Apr 09 '12 23:04 hallahan

Do we have any reason to believe that playing the journal will be slow?

A bigger issue is that the journal is unreliable, as is anything retrieved from a foreign site. Other issues here are discussing what sort of journal revision would be possible and/or appropriate.

I'm guessing even long journals could be played forward quickly with a modest amount of code. This code already exists on the server otherwise one couldn't create files. It would be interesting to reconstruct a page every time it is fetched and then mark the paragraphs that can't be reconstructed from the journal.

If you run a clean site, that is one that doesn't tamper with your own journal entries, then you should expect to play actions forward to merge revised versions from a forked-from source. That is the use-case that motivated the current structure.

Images and datasets are stored in the page to make the page whole. It would be possible to write a server that split images out of the page and manage them as one often does with static assets.

Apr 10 '12 00:04 WardCunningham