notedown lists with paragraphs

Related to https://github.com/aaren/notedown/issues/40, notedown now works with arbitrary language blocks, which is really great! I have noticed that the conversion in that case is more fragile however. For example, this markdown file (saved as example.md)

A markdown file
=================

This markdown file has lists with newlines between the items

-   Item 1
    -  item 1a

    -  item 1b

converts just fine with notedown example.md, but notedown --match=stata example.md produces

Traceback (most recent call last):
  File "/home/izahn/.local/bin/notedown", line 9, in <module>
    load_entry_point('notedown==1.5.0', 'console_scripts', 'notedown')()
  File "/home/izahn/.local/lib/python3.5/site-packages/notedown/main.py", line 312, in app
    main(args, help=parser.format_help())
  File "/home/izahn/.local/lib/python3.5/site-packages/notedown/main.py", line 275, in main
    notebook = reader.read(ip, as_version=4)
  File "/usr/lib/python3.5/site-packages/nbformat/v4/rwbase.py", line 89, in read
    return self.reads(nbs, **kwargs)
  File "/home/izahn/.local/lib/python3.5/site-packages/notedown/notedown.py", line 380, in reads
    return self.to_notebook(s, **kwargs)
  File "/home/izahn/.local/lib/python3.5/site-packages/notedown/notedown.py", line 372, in to_notebook
    cells = self.create_cells(blocks)
  File "/home/izahn/.local/lib/python3.5/site-packages/notedown/notedown.py", line 351, in create_cells
    markdown_cell = self.create_markdown_cell(block)
  File "/home/izahn/.local/lib/python3.5/site-packages/notedown/notedown.py", line 326, in create_markdown_cell
    markdown_cell = nbbase.new_markdown_cell(**kwargs)
  File "/usr/lib/python3.5/site-packages/nbformat/v4/nbbase.py", line 112, in new_markdown_cell
    validate(cell, 'markdown_cell')
  File "/usr/lib/python3.5/site-packages/nbformat/v4/nbbase.py", line 23, in validate
    return validate(node, ref=ref, version=nbformat)
  File "/usr/lib/python3.5/site-packages/nbformat/validator.py", line 156, in validate
    raise better_validation_error(e, version, version_minor)
  File "/usr/lib/python3.5/site-packages/nbformat/validator.py", line 152, in validate
    return validator.validate(nbjson, {'$ref' : '#/definitions/%s' % ref})
  File "/usr/lib/python3.5/site-packages/jsonschema/validators.py", line 123, in validate
    raise error
jsonschema.exceptions.ValidationError: None is not valid under any of the given schemas

Failed validating 'oneOf' in schema['properties']['source']:
    {'oneOf': [{'type': 'string'},
               {'items': {'type': 'string'}, 'type': 'array'}]}

On instance['source']:
    None

Mar 24 '16 17:03 izahn

Thanks for reporting again.

This is because notedown isn't very clever about parsing markdown and it is parsing item 1b as an indented code block. It does this regardless of the match setting.

With --match=all, item 1b ends up as a code block.

With --match=fenced, item 1b is converted back into a markdown cell

With any other --match value we get the error above. This happens because None is used in place of the block content. This is to do with the parsing difference between fenced and indented blocks. Previous to the merge of #41 this bit of code was inaccessible to indented blocks.

I've been thinking about switching to a proper parser for a bit but haven't got round to it. I could maybe hack another patch onto the homegrown parser so that it excludes list items from being code blocks.

Note on parsers: I've looked at using mistune before (used by jupyter). Mistune doesn't parse attributed code blocks by default but it is easy enough to make a new grammar rule that does. Mistune then does a great job of parsing all of the markdown correctly and giving back an AST.

Problem with mistune:

Only offers a html renderer, when we need a markdown renderer.
Doesn't output where in the text certain matches occur. With this we could chop out the code blocks like notedown does and avoid needing to know anything about markdown.

Fixing (1) by writing a markdown renderer seems a bit full on. However, it would be a good contribution to mistune. If there was also a way to convert mistunes AST into pandoc's json AST then you could do a lot of powerful things, leveraging pandoc filters.

(2) is more addressable. It would require monkey patching mistune.BlockLexer.parse to return the regex match group, from which line numbers can be determined, as well as tweaking the fenced code grammar rule.

Mar 24 '16 22:03 aaren

OK makes sense, I'll just edited the input documents by hand for now. notedown saved me hours of tedious work already, thank you! On Mar 24, 2016 6:59 PM, "aaren" [email protected] wrote:

Thanks for reporting again.

This is because notedown isn't very clever about parsing markdown and it is parsing item 1b as an indented code block. It does this regardless of the match setting.

With --match=all, item 1b ends up as a code block.

With --match=fenced, item 1b is converted back into a markdown cell

With any other --match value we get the error above. This happens because None is used in place of the block content. This is to do with the parsing difference between fenced and indented blocks. Previous to the merge of #41 https://github.com/aaren/notedown/pull/41 this bit of code was inaccessible to indented blocks.

I've been thinking about switching to a proper parser for a bit but haven't got round to it. I could maybe hack another patch onto the homegrown parser so that it excludes list items from being code blocks.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/aaren/notedown/issues/43#issuecomment-201065138

Mar 24 '16 23:03 izahn