markdown-to-json icon indicating copy to clipboard operation
markdown-to-json copied to clipboard

bug: use case where text after header is omitted

Open glimchb opened this issue 7 months ago • 7 comments

this value works

value1 = """
# abc

text1

# def

text2
"""

producing

import markdown_to_json
print(markdown_to_json.dictify(value1))
> OrderedDict([('abc', 'text1'), ('def', 'text2')])

vs this one is missing text

value2 = """
# abc

text1

## xyz

text3

# def

text2
"""

producing

import markdown_to_json
print(markdown_to_json.dictify(value1))
> OrderedDict([('abc', OrderedDict([('xyz', 'text3')])), ('def', 'text2')])

note text1 is missing

glimchb avatar May 29 '25 15:05 glimchb

what structure do you expect from your second example?

njvack avatar May 29 '25 17:05 njvack

good question... maybe extra special field ?

import markdown_to_json
print(markdown_to_json.dictify(value1))
> OrderedDict([('abc', OrderedDict([('_body', 'text1'), ('xyz', 'text3')])), ('def', 'text2')])

?

glimchb avatar May 29 '25 20:05 glimchb

Been a while since I remembered how this works. There is strict mode that just refuses to try to deal with hierarchies that don't map nicely to a dict and the default mode that makes best efforts but is lossy. Or it is a bug, I'll try to take a look.

matthewdeanmartin avatar May 29 '25 21:05 matthewdeanmartin

appreciate your time

glimchb avatar May 29 '25 21:05 glimchb

If you want to follow the original design, dropping text1 is correct behavior. You might want something like

{
  'abc': [
    'text1',
    {
      'xyz': 'text3'
    }],
  'def': 'text2'
}

but a) you'd use super weird nested lists to achieve that and b) it doesn't actually work with the current implementation

The behavior I would recommend is to have basically three modes:

  1. Normal: Warns when it drops non-blank content
  2. Strict: Errors when the structure won't parse (maybe it round-trips the markdown?)
  3. Quiet: Does not warn unless it absolutely cannot make a structure at all

Adding unnamed keys to the output seems wrong to me.

Anyhow -- the current behavior is what I originally intended, though it would be nice to warn about this situation.

@matthewdeanmartin -- if you want to do something about this, feel free; otherwise, I would not consider this a bug.

njvack avatar May 30 '25 23:05 njvack

FWIW I added a thing to README:

https://github.com/njvack/markdown-to-json/blob/master/README.md#what-happens-with-other-markdown-structures

I don't see strict mode here...

njvack avatar May 30 '25 23:05 njvack

thank you @njvack and @matthewdeanmartin for checking this. I think adding a warning is a good idea

glimchb avatar Jun 02 '25 19:06 glimchb