asciidoctorj icon indicating copy to clipboard operation
asciidoctorj copied to clipboard

Postprocessors can only handle String based output

Open robertpanzer opened this issue 9 years ago • 5 comments

Currently the abstract Postprocessor base class only supports String based target formats as the process() gets the Document and a String. Therefore it is for example not possible to create Postprocessors for the PDF backend.

(I admit that it would be very difficult to create a Postprocessor for the PDF backend as the result of the converter is a Asciidoctor::Pdf::Converter. Handling that from Java will be very difficult anyway.)

But other converters might create byte arrays or whatever.

I just want to ask if:

  1. we should stick with this?
  2. follow the path that we have taken with the Converters where we have a generic base interface Converter<T> and an abstract base class StringConverter implements Converter<String>. (Converting the Ruby object to the expected type by the Java implementation could be very hard as well though.)

robertpanzer avatar Oct 06 '15 12:10 robertpanzer

You are absolutely correct that there's a limitation here, though it may not be as serious as the Converter problem. The reason is that objects, such as a PDF, that are produced by non-String converters are probably mutable. So as long as the Postprocessor can get access to that object someway, it can just mutate it in place.

That does bring us back to a question I think we asked before (though I can't recall the answer). How do you get access to the PDF object from the Document interface? In Ruby, you get to it using duck-typing / reflection.

The benefit of the design you are proposing in (2) is that we could return a replacement object instead of mutating the object. I think it's the right goal to aim for.

mojavelinux avatar Oct 08 '15 00:10 mojavelinux

It is currently possible to get the handle to the ruby part of the Document. This requires going back to internal methods though: IRubyObject rubyDoc = ((RubyObjectWrapper) document).getRubyObject(). Then you can do whatever you want with the Ruby object.

Actually I think that a Postprocessor will always be target format dependent, even independent of the types used. Even for DocBook and HTML Postprocessors will probably be different, even though the type of the result is a String in both cases.

So why not let each converter also bring an own Postprocessor supertype?

  • asciidoctorj-core could bring an HtmlPostprocessor and a DocbookPostprocessor which both have a method process(Document, String). (I'm fine though with having only a StringPostprocessor)
  • asciidoctorj-pdf could bring a PdfPostprocessor with a method process(Document, PDFDocument). (It would be a tough task though to create a Java API for the PDF document)
  • asciidoctorj-epub could bring an EPubPostprocessor with the method process(Document, Package).
  • And custom Java converters should play nicely and also bring their own XYZPostprocessor.

HOW the converter registers its Postprocessor type and how it intercepts the call from Asciidoctor is sth that I don't want to think about for now.

I'm also fine to stay with the current situation as it seems that no one really had these problems yet.

I just could imagine that someone for example wants to have a DRAFT watermark across all pages in a PDF for drafts and I think a Postprocessor would be the right place for such an extension as this doesn't seem to be covered by the PDF theming solution

robertpanzer avatar Oct 12 '15 07:10 robertpanzer

I agree that a Postprocessor has a 1-to-1 mapping to output format. What the Postprocessor should receive is a generic object, whereas the subtypes get to receive the actual object being produced (String, maybe even a DOM, PDF object, etc).

mojavelinux avatar Dec 01 '15 22:12 mojavelinux

After giving some though here http://discuss.asciidoctor.org/Attachments-in-pdf-td6083.html#a6088, I don't like the idea of recieving a wrapper of the output format because you are still dependant on the converter used. In the case of PDF you still need to mess with Prawn, or as Robert said rewrite a whole PDF wrapper. But, what if output is a generic interface like this?

T getOutput()
InputStream getRawOutput()

The first could be defined with generics and we can provide some PdfOutput, HtmlOutput, etc, this offers the freedom to hack the underlying converted if one wants to do so and fix the casting insues in PostProcessorProxy (provided Java's type erasure does not backlash). The second would allow to get the raw data and pass it to another API. I am thinking in the case of PDF, take the stream and process it with PDFBox. Finally, the process method should return an InputStream, so that AsciidoctorJ can read it and write it to whatever output is configured.

abelsromero avatar Dec 14 '17 10:12 abelsromero

We could even be really nice an offer a getStringOutput() (or something to that affect) when you know you are working with a string (or, even if you're not, it still is a string representation).

mojavelinux avatar Oct 15 '18 06:10 mojavelinux