rust-pdf icon indicating copy to clipboard operation
rust-pdf copied to clipboard

Refactor the code

Open J-F-Liu opened this issue 9 years ago • 4 comments

I feel the current code is difficult to extend for adding new features. Suggest to use lopdf to do PDF object serialization. Here is some example code:

extern crate lopdf;
use lopdf::{Document, Object, Dictionary, Stream, StringFormat};
use Object::{Null, Integer, Name, String, Reference};

let mut doc = Document::new();
doc.version = "1.5".to_string();
doc.add_object(Null);
doc.add_object(true);
doc.add_object(3);
doc.add_object(0.5);
doc.add_object(String("text".as_bytes().to_vec(), StringFormat::Literal));
doc.add_object(Name("name".to_string()));
doc.add_object(Reference((1,0)));
doc.add_object(vec![Integer(1), Integer(2), Integer(3)]);
doc.add_object(Stream::new(Dictionary::new(), vec![0x41, 0x42, 0x43]));
let mut dict = Dictionary::new();
dict.set("A", Null);
dict.set("B", false);
dict.set("C", Name("name".to_string()));
doc.add_object(dict);
doc.save("test.pdf").unwrap();

J-F-Liu avatar Dec 29 '16 05:12 J-F-Liu

One of the goals for rust-pdf is to be able to create large documents in a small memory footprint, by writing each object to the file as soon as possible (and by serializing dictionarys immediatley). Lopdf seems to take the opposite aproach, keeping everything in memory as high-level objects until finallay serializing the entire document.

kaj avatar Dec 29 '16 15:12 kaj

Normally a PDF document won't be very large, ranging form tens of KB to hundreds of MB. Memory size is not a bottle neck for today's computer. By keep the whole document in memory, stream length can be pre-calculated, no need to use a reference object for the Length entry, the resulting PDF file is smaller for distribution and faster for PDF consumers to process.

Producing is a one-time effort, while consuming is many more.

J-F-Liu avatar Dec 30 '16 06:12 J-F-Liu

Just out of curiosity I cloned the repo and changed all the writes to an internal String buffer, and added one last write at the very end to dump the buffer to the open file. I've disabled fonts for the moment, it looks like those might not be so easy.

I'm not really sure what qualifies as a large PDF document, but the circle output is about 5 kB so call that a small PDF. The original implementation runs in about 2 ms, and writing to an internal buffer brings that to 0.86 ms.

If I take the mandala example and pass 100 on the command line it spits out an 832 kB document, which is about the same size as some papers on arXiv. The original implementation runs in about 250 ms and writing to an internal buffer finishes in 11 ms. I'm not even sure this qualifies as a large pdf, being that it's only twice as large as the binary these compile to.

I quite like the API you've started building here, but I'm not comfortable with the constant writing to disk. Are you still committed to minimizing memory footprint?

saethlin avatar Aug 10 '17 05:08 saethlin

Constant writing is not the same as constant writing to disk. In some cases, the writing is to an internal buffer (e.g. a Vec<u8>) anyway, and when writing to an acutal file on disk that should probably be done through a std::io::BufWriter.

kaj avatar May 08 '19 15:05 kaj