lexbor icon indicating copy to clipboard operation
lexbor copied to clipboard

Expose a way of retrieving lexbor tokenizer and tree errors

Open nobodywasishere opened this issue 10 months ago • 4 comments

https://github.com/lexbor/lexbor/blob/91444fd2a4a5177a73a603944cc07a5c5a54a258/source/lexbor/html/tokenizer/error.h https://github.com/lexbor/lexbor/blob/91444fd2a4a5177a73a603944cc07a5c5a54a258/source/lexbor/html/tree/error.h

nobodywasishere avatar Feb 21 '25 20:02 nobodywasishere

For what use case it needed? May be I not use too much, but never encountered these errors.

kostya avatar Feb 21 '25 20:02 kostya

Am currently writing a linter for crinja and need a way to verify HTML is syntactically and semantically correct without going too crazy. My thought was using lexbor would be the easiest thing to build off of for this (doesn't mean you need to add this functionality, happy to PR if it'd be accepted).

nobodywasishere avatar Feb 21 '25 21:02 nobodywasishere

lexbor autofix html while parsing, so meet error I think impossible. Only maybe rare use cases with broken utf8 or something

kostya avatar Feb 21 '25 21:02 kostya

It may auto-fix them, but those errors are still reported / stored on the document. They're not rare ones like utf-8 encoding either (see the two error enums above). Trying to write bindings for the document and all of the types it uses though is proving to be extremely tedious.

Wonder if it'd be possible to add a little shim file to compile with lexbor-c that adds two methods that take the document and just returns the errors found.

nobodywasishere avatar Feb 21 '25 22:02 nobodywasishere