paperwork icon indicating copy to clipboard operation
paperwork copied to clipboard

Zip the documents

Open jflesch opened this issue 13 years ago • 11 comments

It could be useful to actually zip each document:

  • It would bring Paperwork closer to the way OpenDocuments work
  • It would make documents transfer easier
  • It would reduce the stress on the filesystem

jflesch avatar Oct 25 '11 12:10 jflesch

Beware of #124

jflesch avatar May 03 '13 16:05 jflesch

Will be redundant with #124

jflesch avatar Feb 19 '16 11:02 jflesch

Yeah, #124 is not happening before a very long time, so let's go with this for now.

jflesch avatar Nov 10 '17 07:11 jflesch

Should only be used on small documents (< 20 pages I guess). It would make image modifications really CPU-intensive / disk-intensive on big documents.

jflesch avatar Nov 10 '17 07:11 jflesch

or maybe .tar.gz :)

jflesch avatar Nov 10 '17 13:11 jflesch

I’m not sure this is a good idea:

  • How will this put less stress on the filesystem? The overall size should be almost the same since PDF and JPEG images don’t compress very well.
  • When a part of a document will be read, the whole zip/tgz will have to be read.
  • Same for writing only a part (rotating a page, changing the labels…)
  • It will make Paperwork even more slow than it is today.

tYYGH avatar Nov 11 '17 20:11 tYYGH

How will this put less stress on the filesystem? The overall size should be almost the same since PDF and JPEG images don’t compress very well.

Less files --> less inodes ; + less modification time to check when Paperwork starts.

When a part of a document will be read, the whole zip/tgz will have to be read.

I need to check but I believe zip or tgz (or both) have indexes.

Same for writing only a part (rotating a page, changing the labels…)

Yes, I know, this is the main problem :/

It will make Paperwork even more slow than it is today.

Yes and no. It would make the start time much faster actually.

jflesch avatar Nov 11 '17 23:11 jflesch

I had a look:

  • .zip have an index at the end of the file : https://en.wikipedia.org/wiki/Zip_(file_format)#Structure
  • .tar.gz don't have any index

So .zip might more indicated here. I gues it also explains why LibreOffice and Office both use them too.

jflesch avatar Nov 12 '17 10:11 jflesch

Note that it would also help reduce fragmentation, which could improve documents load time:

  • Ext* file systems try to keep single files in one single row. But files individually may be placed randomly on the hard drive
  • When opening a multiple-pages document, Paperwork usually loads the page sequentially at first

jflesch avatar Nov 12 '17 10:11 jflesch

However, I guess keeping the labels out of the .zip file could be a good idea. No need to rewrite X MB when all you want is just fix the labels that have been guessed.

jflesch avatar Nov 12 '17 10:11 jflesch

That would be a reasonable compromise indeed.

tYYGH avatar Nov 12 '17 15:11 tYYGH