node-html-to-text icon indicating copy to clipboard operation
node-html-to-text copied to clipboard

ANSI colors and styles

Open IonicaBizau opened this issue 10 years ago • 5 comments

Is it possible to add ANSI stlyes to the output?

IonicaBizau avatar Apr 01 '15 18:04 IonicaBizau

You can try to overwrite existing formatters from the formatter.js file.

But I think it will not that easy, cause I normally ignore all styles in side a file.

mlegenhausen avatar Apr 07 '15 07:04 mlegenhausen

Actually, I keep this issue in mind for some possible future improvements.

KillyMXI avatar Jan 01 '24 14:01 KillyMXI

OK, I will reopen it. I am just cleaning up old issues I opened long time ago. 🙈

Happy new year! 🎆

IonicaBizau avatar Jan 01 '24 14:01 IonicaBizau

By the way, with some caveats, it currently works like this:

const html = '<b>Hello</b> <span style="color:red;"><u>World</u>!</span><br/>';
const options = {
  formatters: {
    'bold': function (elem, walk, builder, formatOptions) {
      builder.addLiteral('\x1b[1m');
      walk(elem.children, builder);
      builder.addLiteral('\x1b[22m');
    },
    'underline': function (elem, walk, builder, formatOptions) {
      builder.addLiteral('\x1b[4m');
      walk(elem.children, builder);
      builder.addLiteral('\x1b[24m');
    },
    'red': function (elem, walk, builder, formatOptions) {
      builder.addLiteral('\x1b[31m');
      walk(elem.children, builder);
      builder.addLiteral('\x1b[39m');
    }
  },
  selectors: [
    { selector: 'b', format: 'bold' },
    { selector: 'u', format: 'underline' },
    { selector: 'span[style*="color:red"i]', format: 'red' }
  ]
};

const text = htmlToText(html, options);
console.log(text);

Result: image

Usable for crafted HTML. Not usable for arbitrary HTML:

  • completely unaware of CSS that is outside of the style attribute and can't be captured with selectors - this is unlikely to change ever;
  • html-to-text won't combine different selectors in case they happen to match the same tag, like <u style="color:red;">World</u>. Might be addressable to some extent - would require significantly rethinking how formatters work;
  • Literals still affect computed line length. Might be addressable if I allow literals to be defined as invisible and alter the line length counting;
  • I'm not aware if https://en.wikipedia.org/wiki/ANSI_escape_code#SGR_(Select_Graphic_Rendition)_parameters can be stored to a stack and restored. Any "restore default" command is unaware of previous styles set by outer tag (for example, will break when nesting different colors). Such stack can potentially be recreated at the level of formatters (maybe with some support from the block text builder that also keeps the stack of tags).

KillyMXI avatar Jan 03 '24 23:01 KillyMXI

Wow! That is amazing! Thank you for this!

On Thu, Jan 4, 2024 at 1:17 AM KillyMXI @.***> wrote:

By the way, with some caveats, it currently works like this:

const html = 'Hello World!
';const options = { formatters: { // Create formatters. 'bold': function (elem, walk, builder, formatOptions) { builder.addLiteral('\x1b[1m'); walk(elem.children, builder); builder.addLiteral('\x1b[22m'); }, 'underline': function (elem, walk, builder, formatOptions) { builder.addLiteral('\x1b[4m'); walk(elem.children, builder); builder.addLiteral('\x1b[24m'); }, 'red': function (elem, walk, builder, formatOptions) { builder.addLiteral('\x1b[31m'); walk(elem.children, builder); builder.addLiteral('\x1b[39m'); } }, tags: { // Assign to tags. 'b': { format: 'bold' }, 'u': { format: 'underline' }, 'span[style*="color:red"i]': { format: 'red' } }}; const text = htmlToText(html, options);console.log(text);

Result: image.png (view on web) https://github.com/html-to-text/node-html-to-text/assets/13851064/aa3b188f-b8f4-4f7f-bbb8-aa814faa0b8d

Usable for crafted HTML. Not usable for arbitrary HTML:

  • completely unaware of CSS that is outside of the style attribute and can't be captured with selectors - this is unlikely to change ever;
  • html-to-text won't combine different selectors in case they happen to match the same tag, like World. Might be addressable to some extent - would require significantly rethinking how formatters work;
  • Literals still affect computed line length. Might be addressable if I allow literals to be defined as invisible and alter the line length counting;
  • I'm not aware if https://en.wikipedia.org/wiki/ANSI_escape_code#SGR_(Select_Graphic_Rendition)_parameters can be stored to a stack and restored. Any "restore default" command is unaware of previous styles set by outer tag. Such stack can potentially be recreated at the level of formatters (maybe with some support from the block text builder that also keeps the stack of tags). But I'm not very invested to explore it further for this niche use case, with all other limitations still in place.

— Reply to this email directly, view it on GitHub https://github.com/html-to-text/node-html-to-text/issues/43#issuecomment-1876097951, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAV3J44KAHNG7QREUXTTA4TYMXRIFAVCNFSM4A7CXD3KU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBXGYYDSNZZGUYQ . You are receiving this because you modified the open/close state.Message ID: @.***>

IonicaBizau avatar Jan 04 '24 05:01 IonicaBizau