novelWriter icon indicating copy to clipboard operation
novelWriter copied to clipboard

Single file format (again)

Open vkbo opened this issue 3 years ago • 5 comments

The single file format has been discussed before. There are three key benefits of the current structure of saving each text document and meta data unit as a separate file. Especially when it comes to the project's text itself:

  1. It is safer against data loss.
  2. It is very file sync friendly.
  3. It is version control friendly.

Issue: The fact that all the data is stored in plain text files seems to imply to many users that the data is open to be edited externally, and therefore the project folder should be easy to navigate for that purpose. But the project folder is intended as a file database, not as a folder with files to be manually accessed. Because of the possibility of manual edits, a lot of extra care has to be taken when reading the data in order to account for user-introduced errors.

A few conditions must apply to a single file format:

  • It must be optional (although perhaps default) for the foreseeable future.
  • In order to satisfy point 1 above, the file format must retain the individual document read/write properties of the current setup, and not rely on an in-memory data buffer of all project content between open and save/close.
  • The file sync stability of point 2 must be retained.
  • The version control support can only be retained by using a single flat plain text file format, and even that is not ideal. Point 3 is not a requirement. For users who requires this, opting out of the single file format is a reasonable solution.

Implementation

I've considered a few options before, detailed in #259. While using a read/write database as storage would solve the in-place issue of point 1, database files really aren't file sync friendly or even file sync safe, which breaks with point 2.

The one solution that stands out as the simplest solution is to keep the project as an archive file. While an archive is not really writable in the sense demanded by point 1, a temporary workspace can bypass this issue: When the project is opened, it can be extracted into a temp folder, and written back as an archive when the project is saved.

The additional benefit of this implementation is that it avoids the need to make any fundamental changes to how novelWriter reads and writes data during the writing session. All that is needed is to inject an extra step in the open and save process.

Implementation steps:

  • The archive I/O itself will be implemented as a Python ZipFile object.
  • Each project will be assigned a UUID when it is first saved as an archive. The UUID will be saved in the "comment" section of the archive, which can contain 64kb of data.
  • On open, the archive will be extracted to a folder under the novelWriter app data location (OS dependant). The folder will have the UUID as its name. This means it will be possible to recognise and recover a previous interrupted session left by, for instance, an app crash or a computer crash.
  • On save, the content of the UUID folder will be saved to a new zip file, and then it will replace the previous file.
  • Optional: Sanitise the file list on read/write to the zip file. This will make it easier to clean out deprecated files, and also ensure that files added in other ways to the archive are cleaned out.

This implementation should be fairly file sync friendly as well. At least as much as a Word or Open Document file, which are built in essentially the same way.

Additional:

  • Allow the user to discard the working (extracted) copy of the project when closing, see #1470

vkbo avatar Jan 25 '22 18:01 vkbo

The newly added NWStorage class in the source code for 2.0 is ready for the above proposed implementation:

  • The UUID has been added to the project file format.
  • File paths and document objects are handled by the storage class, so the rest of the app doesn't deal with figuring out paths any more.
  • The backup zipIt method has been generalised and moved to the storage class, and can be used for writing the zipped version of the project.

vkbo avatar Nov 29 '22 13:11 vkbo

I was thinking about this yesterday, nice to see it's coming. Are these archives going to differ from those novelWriter writes as backups? It would be interesting if the backup zips could simply be opened with novelWriter without having to manually unpack them first.

HeyMyian avatar May 09 '23 22:05 HeyMyian

Pretty much, yeah. novelWriter is currently running on the new storage class, so extending this feature is pretty simple.

The relationship between backups and the single file format is actually reversed. I wrote a new zip method for the single file that is now only used for backups, so they will in principle be identical. That said, old backup files may not work as project files due to lacking meta data in the older formats. I may add some converter function to handle this though. It's probably a nice feature.

vkbo avatar May 10 '23 07:05 vkbo