asciidoctorj
asciidoctorj copied to clipboard
Postprocessors can only handle String based output
Currently the abstract Postprocessor base class only supports String based target formats as the process()
gets the Document and a String.
Therefore it is for example not possible to create Postprocessors for the PDF backend.
(I admit that it would be very difficult to create a Postprocessor for the PDF backend as the result of the converter is a Asciidoctor::Pdf::Converter
. Handling that from Java will be very difficult anyway.)
But other converters might create byte arrays or whatever.
I just want to ask if:
- we should stick with this?
- follow the path that we have taken with the Converters where we have a generic base interface
Converter<T>
and an abstract base classStringConverter implements Converter<String>
. (Converting the Ruby object to the expected type by the Java implementation could be very hard as well though.)
You are absolutely correct that there's a limitation here, though it may not be as serious as the Converter problem. The reason is that objects, such as a PDF, that are produced by non-String converters are probably mutable. So as long as the Postprocessor can get access to that object someway, it can just mutate it in place.
That does bring us back to a question I think we asked before (though I can't recall the answer). How do you get access to the PDF object from the Document interface? In Ruby, you get to it using duck-typing / reflection.
The benefit of the design you are proposing in (2) is that we could return a replacement object instead of mutating the object. I think it's the right goal to aim for.
It is currently possible to get the handle to the ruby part of the Document.
This requires going back to internal methods though: IRubyObject rubyDoc = ((RubyObjectWrapper) document).getRubyObject()
.
Then you can do whatever you want with the Ruby object.
Actually I think that a Postprocessor will always be target format dependent, even independent of the types used. Even for DocBook and HTML Postprocessors will probably be different, even though the type of the result is a String in both cases.
So why not let each converter also bring an own Postprocessor supertype?
- asciidoctorj-core could bring an HtmlPostprocessor and a DocbookPostprocessor which both have a method
process(Document, String)
. (I'm fine though with having only a StringPostprocessor) - asciidoctorj-pdf could bring a PdfPostprocessor with a method
process(Document, PDFDocument)
. (It would be a tough task though to create a Java API for the PDF document) - asciidoctorj-epub could bring an EPubPostprocessor with the method
process(Document, Package)
. - And custom Java converters should play nicely and also bring their own XYZPostprocessor.
HOW the converter registers its Postprocessor type and how it intercepts the call from Asciidoctor is sth that I don't want to think about for now.
I'm also fine to stay with the current situation as it seems that no one really had these problems yet.
I just could imagine that someone for example wants to have a DRAFT watermark across all pages in a PDF for drafts and I think a Postprocessor would be the right place for such an extension as this doesn't seem to be covered by the PDF theming solution
I agree that a Postprocessor has a 1-to-1 mapping to output format. What the Postprocessor should receive is a generic object, whereas the subtypes get to receive the actual object being produced (String, maybe even a DOM, PDF object, etc).
After giving some though here http://discuss.asciidoctor.org/Attachments-in-pdf-td6083.html#a6088, I don't like the idea of recieving a wrapper of the output format because you are still dependant on the converter used. In the case of PDF you still need to mess with Prawn, or as Robert said rewrite a whole PDF wrapper. But, what if output is a generic interface like this?
T getOutput()
InputStream getRawOutput()
The first could be defined with generics and we can provide some PdfOutput, HtmlOutput, etc, this offers the freedom to hack the underlying converted if one wants to do so and fix the casting insues in PostProcessorProxy
(provided Java's type erasure does not backlash).
The second would allow to get the raw data and pass it to another API. I am thinking in the case of PDF, take the stream and process it with PDFBox.
Finally, the process
method should return an InputStream, so that AsciidoctorJ can read it and write it to whatever output is configured.
We could even be really nice an offer a getStringOutput()
(or something to that affect) when you know you are working with a string (or, even if you're not, it still is a string representation).