pymarkdown icon indicating copy to clipboard operation
pymarkdown copied to clipboard

Parsing error

Open bheberlein opened this issue 2 months ago • 0 comments

Bug Report

The following simple Markdown example does not parse:

# Title

- a

  - b

    - c

      - d

        > - e

It seems that the error only occurs when

  • There is a blockquote nested at least 4 levels deep in a nested list, and
  • The blockquote contains a list.

Parsing is successful if the blockquote is nested at a shallower level, but not at a deeper level. Parsing also fails with numbered lists inside or outside the blockquote.

Bug Type

  • [ ] Assertion Failure
  • [ ] Documentation
  • [x] Scan/Rule not working as expected
  • [ ] Fix/Rule not working as expected
  • [ ] Other

Description

I run pymarkdown --stack-trace --continue-on-error scan test.md, and I get

pymarkdown.general.bad_tokenization_error.BadTokenizationError: An unhandled error occurred processing the document.

The original error occurs in pymarkdown/container_blocks/container_block_leaf_processor.py, in ContainerBlockLeafProcessor.__parse_line_for_leaf_blocks().

Digging in a bit, I found that position_marker.text_to_parse is " - e" (with 2 leading spaces) whereas detabified_original_line is " > - e".

This means that detabified_original_line.find(position_marker.text_to_parse) returns -1 and assert detabified_original_start_index != -1 hits an AssertionError because there is only 1 space beetween > and - e in the original text. It seems that an extra space is getting prepended to the contents of the blockquote at some point during execution of ContainerBlockLeafProcessor.__process_leaf_tokens.

I wonder if this bug may be related to #830 or #831.

Specifics

What operating system and version are you running into this behavior on?

MacOS 15.2

What version are you seeing this behavior in? (Run pip list or pipenv run pip list and look for the entry beside pymarkdownlnt.)

0.9.32

Are there any extra steps that need to be taken before executing the application?

No

What is the command line you invoke to get this behavior?

pymarkdown --stack-trace --continue-on-error scan test.md

Are you using a configuration file? Either on the command line or one of the implicit configuration files? If so, attach that file to this issue.

Default configuration

What Markdown document causes this behavior to manifest? Attach that file to this issue.

test.md

Actual Behavior

Parse failure due to unhandled error — BadTokenizationError

See traceback.txt for the full traceback.

Expected Behavior

I expect that with --continue-on-error I should at least be able to parse the rest of the document & see other linter errors that may have been found, rather than having the linter crash out.

But I would also expect this document to parse just fine, since I believe it is valid Markdown.

bheberlein avatar Oct 22 '25 16:10 bheberlein