knowledge-repo icon indicating copy to clipboard operation
knowledge-repo copied to clipboard

Add support for HTML Knowledge Posts

Open bcipolli opened this issue 7 years ago • 9 comments

Auto-reviewers: @NiharikaRay @matthewwardrop @earthmancash @danfrankj

We have quite a few HTML posts that we'd need to include in our Knowledge Repo, were we to launch one (which I'm interested to do). Since HTML is valid Markdown, I believe this could be done simply; we'd just need a way to emit the markdown document header.

If this sounds OK, I think I could tackle it. There's already a HTMLConverter class (and it's registered!); I believe this would just involve:

  • Documenting a method of embedding header metadata in HTML
  • Adding a simple from_file class to HTMLConverter
  • Adding dependencies (e.g. beautifulsoup) used within the from_file method.

Thoughts?

bcipolli avatar Oct 15 '17 15:10 bcipolli

Another option for metadata would be to allow for specification on the command-line, when add is specified. The path (-p) option is already custom to that command, so I guess it wouldn't be too hard to add options for each required header metadata value.

Could be limited to html, or be propagated to other conversion types as well.

Just an idea :)

bcipolli avatar Oct 15 '17 15:10 bcipolli

@bcipolli Thanks for reaching out about this.

Regarding HTML -> Markdown, have you looked at html2text or other similar libraries that already do this? Adding headers in an automated way will require some work. Would you want the headers to be in the HTML source (in meta tags or some such); or in a comment at the top of the document; or would you want to pass them in at runtime?

matthewwardrop avatar Oct 17 '17 21:10 matthewwardrop

@matthewwardrop I would use META tags, and would be happy to allow passing in at runtime as well... if that sounds good.

Would definitely use a library to insert the HTML, would not roll my own :)

bcipolli avatar Oct 18 '17 17:10 bcipolli

Happy to work on this as part of #hacktoberfest, if y'all are interested.

bcipolli avatar Oct 18 '17 17:10 bcipolli

@bcipolli We'd love to have your contributions! Feel free to submit patches! If possible, perhaps write the HTML support in a separate patch from the support for run-time specification of missing tags. The runtime specification of missing tags probably belongs as part of the retooling effort in #308 , so once the HTML patch is ready, drop a line and we can discuss the best way to extend the tooling!

matthewwardrop avatar Oct 18 '17 18:10 matthewwardrop

is there any development in this space?

merkliopas avatar Feb 28 '19 09:02 merkliopas

@merkliopas We prototyped this, but didn't get to a place where we felt confident to push a PR. At this point, we've moved to using Blogdown / Hugo via R - I'm ok closing this out.

bcipolli avatar Mar 08 '19 13:03 bcipolli

It would be nice to support HTML format so that we can avoid rendering Rmarkdown in the run time, which might require installing additional packages for rendering in the docker container/host.

jpzhangvincent avatar May 27 '20 07:05 jpzhangvincent

Agreed HTML support would be nice. I'll leave this issue open in case anyone decides to contribute with a patch.

bulam avatar May 27 '20 08:05 bulam