gofpdf icon indicating copy to clipboard operation
gofpdf copied to clipboard

Supporting a simple markup language

Open ysaakpr opened this issue 7 years ago • 28 comments

This library is very useful for many.. A feature to generate the document using a simple markup(xml or yaml format,) will be a great along with this. Its always easier for the end user to use markups rather than programs, when the number of pdfs and formats large scale

There is an example https://metacpan.org/pod/PDF::Boxer#MARKUP

Do you have any plan adding something similar on this project;

ysaakpr avatar May 30 '17 17:05 ysaakpr

Thanks for your comments and suggestion, @ysaakpr. I have mused about a markup layer to facilitate the generation of PDFs.

Instead of adding a layer to gofpdf, one possibility would be to write a module for pandoc. (Currently, pandoc generates PDFs by means of LaTeX but the use of gofpdf could provide advantages for some kinds of documents.) The benefit would be a number of types of input markup.

jung-kurt avatar May 30 '17 18:05 jung-kurt

See Deck:

https://github.com/ajstarks/deck/ https://godoc.org/github.com/ajstarks/deck https://github.com/ajstarks/deck/tree/master/cmd/pdfdeck https://speakerdeck.com/ajstarks/deck-a-go-package-for-presentations

ajstarks avatar May 31 '17 00:05 ajstarks

See Deck

Nice!

jung-kurt avatar May 31 '17 00:05 jung-kurt

A pandoc layer would open up lots of possibilities, but wouldn't be much use from inside a Go program. You'd have to call an outside binary, and make assumptions about the environment in which it was running. If I have a document already open and a page half-written, it doesn't let me drop some styled text onto that page. And it squanders the "no dependencies" benefit that gofpdf has going for it.

Are there any Go libraries that do markup layout? It's a complicated subject, deep enough to feel out of scope for gofpdf.

marcus-downing avatar May 31 '17 08:05 marcus-downing

A pandoc layer would open up lots of possibilities, but wouldn't be much use from inside a Go program.

This notion was admittedly not thought out well. I figured a command-line utility that embeds gofpdf could render a PDF from a file in pandoc's intermediate format. This way pandoc does the work of standardizing the input markup and gofpdf does the work of rendering output. But I'm not sure if pandoc even exports its intermediate content, and if it does it could prove to be a moving target with all of the disadvantages you cite.

One simple step I would like to take in the direction of making documents easier to produce (the reason that some kind of markup was originally requested) would be to include some examples that demonstrate how useful one-off closures written in Go can be. Each document type, for example an invoice or financial statement, has certain patterns that can be encapsulated very succinctly in a Go closure. This helps separate content management from the low-level API.

It's a complicated subject, deep enough to feel out of scope for gofpdf.

I agree. I think if anything is done in this direction, it should be as a command line utility in the contrib directory.

jung-kurt avatar May 31 '17 11:05 jung-kurt

@marcusatbang you can use the deck/generate package to build layout programmatically:

see:

https://godoc.org/github.com/ajstarks/deck/generate

ajstarks avatar May 31 '17 11:05 ajstarks

you can use the deck/generate package to build layout programmatically

Thanks, @ajstarks. This is definitely worth investigating.

jung-kurt avatar May 31 '17 11:05 jung-kurt

I feel a little late to this party but I have built a commonmark parser that generates PDF's for one of my work projects in the past (about 12 months ago). At present it still has a lot of implementation specific code it in (i.e. hard coded header production) that relates to its current use case, but I'm in the process of extracting it all into a more general solution for another project.

Ill make an effort to get it to an acceptable level and post it as a repository for others to reference.

Though as it stands there are some out of place (and slightly hacky) additions of custom markdown to add features like centering text and hanging indents, so a more verbose markdown language could still be a massive boon, commonmark markdown was just specifically requested for this project.

Maldris avatar Jul 11 '17 00:07 Maldris

Good to hear of your project, @Maldris.

My thought is that a markup parser and document generator would either be a standalone project that uses gofpdf, or a package/command pair that resides beneath the contrib directory. That way, it is convenient for users that want it, but it isn't imposed on users that don't need it.

Looking forward to seeing your work.

jung-kurt avatar Jul 11 '17 01:07 jung-kurt

In hindsight my comment was unclear. My project is a stand alone package that has gofpdf as a dependancy, as well as a well maintained markdown parser, providing an interface between the two. Its simple, and in many ways a bit ugly, but its got the job done for us. For a secondary project for the same employer I'm going to have to pull it out and make it an abstract package that both projects can use, but its not the highest priority at the moment, but once its done Ill get it on github and then it'll be easier to get feedback and improve it, just need to get all the application specific logic pulled out first.

Maldris avatar Jul 11 '17 01:07 Maldris

Sounds like an excellent project, @Maldris. Thanks for the clarification.

jung-kurt avatar Jul 11 '17 10:07 jung-kurt

@Maldris I am interested in your project status; Even i have the same kind of idea to bridge gopdf and a markdown parser and generate fast builder for pdf generation; There exist no proper markup language that just support pdf and pdf is one of the common document format that majority are interested in; Learning and adding a library dependency like LaTEx is very heavy and which cause lots issues in adaptability. Simple markup that can take out the usage of wkhtmltopdf solution are having very good demand. Let me know if you can share your initial code so that we can start some wrapper project on top of the beautiful gopdf project.

ysaakpr avatar Nov 10 '17 10:11 ysaakpr

Hi, sorry for the delay. The project was originally done as contract work for a law firm as part of a larger project, to which they presently own the IP. I'm in discussions about returning ownership of that code to me to be made public. Ill hopefully have an update and some more news soon.

Maldris avatar Nov 10 '17 10:11 Maldris

I'm also interested in this. I have a list of useful things I thought I'd create in order to learn gofpdf and a markdown-to-PDF conversion was on the list. @Maldris - did the IP problem get resolved?

mandolyte avatar Dec 19 '17 14:12 mandolyte

I have a list of useful things I thought I'd create in order to learn gofpdf and a markdown-to-PDF conversion was on the list.

Version 2 of blackfriday let you render a Markdown document's syntax tree. That may be a good first step in a project like this.

jung-kurt avatar Dec 19 '17 17:12 jung-kurt

@jung-kurt Found it: https://godoc.org/github.com/russross/blackfriday#Renderer A quick look tells me that first step toward this would be to map the interface methods to gofpdf... essentially a gap analysis to see which parts are easy/harder. I'll keeping thinking about. May have to wait until after the holidays for any serious time/effort. Thanks for the pointer!

mandolyte avatar Dec 20 '17 14:12 mandolyte

@jung-kurt above was the wrong URL, but you probably knew that :-) The blackfriday v2 AST really simplies things. I'm thinking I'll work thru the problem as a standalone example first (https://github.com/mandolyte/examples-gofpdf/tree/master/mdToPdf). If you are happy with the design of htmlbasic, I'll model the markdown solution in a similar manner. Let me know. If you have any comments about the approach I've taken so far, let me know. Thanks!

mandolyte avatar Dec 21 '17 22:12 mandolyte

Nice start, @mandolyte! Even though I knew that blackfriday v2 exposed the document AST, it's interesting to see it used in this context. I look forward to seeing what your project becomes.

I haven't read your code too carefully yet, and I am not familiar with Markdown's corner cases, so my next remark may be uninformed. You may wish to maintain stacks for attributes such as italic, bold, font size, color, etc. in PdfRenderer. (This is easy to do with go slices.) That way, when you leave a node you can simply pop the appropriate attribute and apply the new top. However, on second thought, Markdown probably protects against nesting the same attribute, so this likely doesn't buy you anything. (In HTML, a stack helps to render things like a<i>b<i>c</i>d</i>e so that d isn't incorrectly set to non-italic.)

jung-kurt avatar Dec 21 '17 23:12 jung-kurt

Yup, you are correct. The only renderer they have for v2 is HTML; which means they are really going from one markup to another. So they just need to close off the tags properly and the actual issue of rendering nested markups falls the browser. So I'll have to maintain the stack to do this properly for PDF. Thanks for drawing my attention to it.

mandolyte avatar Dec 22 '17 13:12 mandolyte

Instead of individual stacks for attributes, you may want to consider aggregating attributes into a structure, and then stacking only the structure. That way, down the road, you could have an API in which attributes for, say, # Heading or [some link](some url) are managed as a set. Then entering and leaving a header block or some other node involves only one push and and one pop. This might facilitate programmatic styling.

jung-kurt avatar Dec 22 '17 13:12 jung-kurt

Sorry for the delays, we found a few legal issues that had to be worked out before I could release the package, and I had to scrub most of our tests out of the package as they contained the documents belonging to the business which I wouldn't be able to release.

... that and with the holiday period things got delayed

So to the point, I've released the docgen library here https://github.com/Maldris/commonmarkDocgen

As a note on its current state:

  • pretty much no test coverage as I had to nuke all our tests out of the repo and its history to be able to release it (things have been tested with out templates, but Ill have to make new tests and templates before it'll show up in the repo)
  • documentation is super basic as all our in-house documentation once again had stuff I couldn't release, so that's a work in progress
  • no documentation for some of the custom markup tags we added because we needed them (yet)
  • no support for images
  • no support for strikethrough markup
  • no support for inline code block

Hope its still useful to some of you, its a super basic tool originally made in about a week in a rush because we needed the functionality at the time, so I hope to go back and clean it up a lot. Especially as we don't build any kind of tree from the tokens we get from the markdown library, which means some of the logic is a big convoluted to deal with the tokens arriving as a mostly linear array with minimal nesting.

Good luck, Hope your all having a good new years so far.

Maldris avatar Jan 01 '18 10:01 Maldris

Thank you for releasing! I'll let you know my thoughts when I give it a try.

jung-kurt avatar Jan 01 '18 12:01 jung-kurt

I now have a fairly complete package here. Needs more documentation before I send over to godocs.

Feedback and testing welcomed!

mandolyte avatar Jan 16 '18 16:01 mandolyte

I completed table support this week, but without any fitting capabilities, but see my blog for one approach. See the readme for current limitations.

Is there interest in making in this a contribution to the gofpdf project? Let me know your thoughts.

Package: https://godoc.org/github.com/mandolyte/mdtopdf Blog on table fitting: http://www.mandolyte.info/2018/01/create-pdf-table-from-csv-file.html

mandolyte avatar Jan 20 '18 12:01 mandolyte

Nice work, @mandolyte!

I completed table support this week, but without any fitting capabilities, but see my blog for one approach.

Another approach for fitting oversize tables would be to size the columns based on some criteria (not necessarily widest entry; perhaps average width) and then word-wrap the entries where needed.

Is there interest in making in this a contribution to the gofpdf project?

The dependence on the blackfriday package would unfortunately relegate it to the contrib directory. I think your package will get more use if it remains independent. I will definitely add a link to it on gofpdf's landing page and maybe blackfriday can do that too.

One recommendation would be to rename the "cmd" directory to "mdtopdf". That way, go install generates a meaningful name for the command.

A valuable enhancement would be to support an options structure as an argument to NewPdfRenderer(). Being able to configure even rudimentary styling (face, size, color of various elements like headers, normal text and links) would be welcomed by most users. To start with, the orientation and paper size could be rolled into the options.

jung-kurt avatar Jan 20 '18 16:01 jung-kurt

Hi @mandolyte, for a starting reference on sizing tables when their larger than the page. My markdown pdf generator: https://github.com/Maldris/commonmarkDocgen does the following:

  • tally sizes of cells and track the largest and the sum, count, etc for each column
  • if the sum of the max for each column is larger than the space between the current page margins it checks the average
  • it then allocates space between the margins based off the ratio of its average use to the total of all averages, and uses this for the column width

Table logic can be seen in render.go from line 318 https://github.com/Maldris/commonmarkDocgen/blob/master/render.go#L318 the column width calculation starts on line 372 https://github.com/Maldris/commonmarkDocgen/blob/master/render.go#L372

Maldris avatar Jan 21 '18 03:01 Maldris

@jung-kurt thanks for the input. I'll do the tweaks needed to give the user more control.

@Maldris great example! Since the blackfriday parser is event driven and calls my code as "callbacks" ,it will take some work to accumulate the table cells into a 2D slice, then emit the table when the "leaving-table" callback comes. Doable, but tricky. I do some of the easier things first :-)

mandolyte avatar Jan 26 '18 13:01 mandolyte

@mandolyte I haven't worked with blackfriday before and didn't realise it worked via a callback mechanic in this way. That does substantially complicate things, best of luck.

Maldris avatar Jan 27 '18 09:01 Maldris