genalog icon indicating copy to clipboard operation
genalog copied to clipboard

Adding ability to extract a template CSS from a given PDF or image file

Open document-intelligence opened this issue 3 years ago • 2 comments

Genalog is great in generating a synthetic document from a given template, but coming up with a template is still a pain.

Wouldn't it be great if I can just point Genalog to a PDF or image, and ask it to synthesize more documents like that?

In other words, can we add the functionality of extracting a CSS template out of a given PDF/image, to complete the cycle?

Thanks!

Document Intelligence

document-intelligence avatar Aug 12 '21 22:08 document-intelligence

Hi Ben! Thank you for your suggestions! I totally agree. This would be a great value added to boost Genalog’s utility!

For this feature, I think Layout Parser looks promising to do most of the heavy lifting for extracting layouts, however currently it does not support exporting layouts in HTML format (as of late, it exports layout information in JSON and csv. So there is some feature gaps to fill in before Genalog can consume it as html files.

I am not so aware of any existing document layout standards such that we can reuse/adopt to make this JSON to HTML conversion easy. Would love to get some suggestions if anyone reading this has experience in matter.

laserprec avatar Aug 16 '21 19:08 laserprec

I'm pretty sure I saw some papers on covering a sketch to an HTML code. I'll have a look at it, it would be a great addition to Genalog

jgc128 avatar Aug 17 '21 14:08 jgc128