comics Support fetching related text with more formatting

Support fetching related text with more formatting

Open jodal opened this issue 13 years ago • 2 comments

E.g. Darths & Drois has a huge formatted text associated with each comic. Since these texts often are half the fun, comics should support fetching larger pieces of text with formatting, and keep a sane amount of this formatting, e.g. headers and bullet lists.

May 23 '11 17:05 jodal

I believe @xim have been looking a bit at this, ref. xim/comics@fdea7223f33b8bb510fdf17976cb52eb63b5b926.

Jun 07 '12 13:06 jodal

I don't remember what we ended up with as a preferred approach. I made a tiny, general converter on my local computer. The idea was:

Get the formatted HTML
Use a dict that transforms elements, something like {'p': lambda data: ' '.join(data.split()) + '\n\n', ...}
Allow the individual crawler to override any element type in this dict

I only tested this with rom.ac and QC, but it should enable good results on any comic. Further suggestions? =)

Jun 11 '12 13:06 xim

comics comics copied to clipboard

Support fetching related text with more formatting

comics
comics copied to clipboard