lists with paragraphs
Related to https://github.com/aaren/notedown/issues/40, notedown now works with arbitrary language blocks, which is really great! I have noticed that the conversion in that case is more fragile however. For example, this markdown file (saved as example.md)
A markdown file
=================
This markdown file has lists with newlines between the items
- Item 1
- item 1a
- item 1b
converts just fine with notedown example.md, but notedown --match=stata example.md produces
Traceback (most recent call last):
File "/home/izahn/.local/bin/notedown", line 9, in <module>
load_entry_point('notedown==1.5.0', 'console_scripts', 'notedown')()
File "/home/izahn/.local/lib/python3.5/site-packages/notedown/main.py", line 312, in app
main(args, help=parser.format_help())
File "/home/izahn/.local/lib/python3.5/site-packages/notedown/main.py", line 275, in main
notebook = reader.read(ip, as_version=4)
File "/usr/lib/python3.5/site-packages/nbformat/v4/rwbase.py", line 89, in read
return self.reads(nbs, **kwargs)
File "/home/izahn/.local/lib/python3.5/site-packages/notedown/notedown.py", line 380, in reads
return self.to_notebook(s, **kwargs)
File "/home/izahn/.local/lib/python3.5/site-packages/notedown/notedown.py", line 372, in to_notebook
cells = self.create_cells(blocks)
File "/home/izahn/.local/lib/python3.5/site-packages/notedown/notedown.py", line 351, in create_cells
markdown_cell = self.create_markdown_cell(block)
File "/home/izahn/.local/lib/python3.5/site-packages/notedown/notedown.py", line 326, in create_markdown_cell
markdown_cell = nbbase.new_markdown_cell(**kwargs)
File "/usr/lib/python3.5/site-packages/nbformat/v4/nbbase.py", line 112, in new_markdown_cell
validate(cell, 'markdown_cell')
File "/usr/lib/python3.5/site-packages/nbformat/v4/nbbase.py", line 23, in validate
return validate(node, ref=ref, version=nbformat)
File "/usr/lib/python3.5/site-packages/nbformat/validator.py", line 156, in validate
raise better_validation_error(e, version, version_minor)
File "/usr/lib/python3.5/site-packages/nbformat/validator.py", line 152, in validate
return validator.validate(nbjson, {'$ref' : '#/definitions/%s' % ref})
File "/usr/lib/python3.5/site-packages/jsonschema/validators.py", line 123, in validate
raise error
jsonschema.exceptions.ValidationError: None is not valid under any of the given schemas
Failed validating 'oneOf' in schema['properties']['source']:
{'oneOf': [{'type': 'string'},
{'items': {'type': 'string'}, 'type': 'array'}]}
On instance['source']:
None
Thanks for reporting again.
This is because notedown isn't very clever about parsing markdown and it is parsing item 1b as an indented code block. It does this regardless of the match setting.
With --match=all, item 1b ends up as a code block.
With --match=fenced, item 1b is converted back into a markdown cell
With any other --match value we get the error above. This happens because None is used in place of the block content. This is to do with the parsing difference between fenced and indented blocks. Previous to the merge of #41 this bit of code was inaccessible to indented blocks.
I've been thinking about switching to a proper parser for a bit but haven't got round to it. I could maybe hack another patch onto the homegrown parser so that it excludes list items from being code blocks.
Note on parsers: I've looked at using mistune before (used by jupyter). Mistune doesn't parse attributed code blocks by default but it is easy enough to make a new grammar rule that does. Mistune then does a great job of parsing all of the markdown correctly and giving back an AST.
Problem with mistune:
-
Only offers a html renderer, when we need a markdown renderer.
-
Doesn't output where in the text certain matches occur. With this we could chop out the code blocks like notedown does and avoid needing to know anything about markdown.
Fixing (1) by writing a markdown renderer seems a bit full on. However, it would be a good contribution to mistune. If there was also a way to convert mistunes AST into pandoc's json AST then you could do a lot of powerful things, leveraging pandoc filters.
(2) is more addressable. It would require monkey patching mistune.BlockLexer.parse to return the regex match group, from which line numbers can be determined, as well as tweaking the fenced code grammar rule.
OK makes sense, I'll just edited the input documents by hand for now. notedown saved me hours of tedious work already, thank you! On Mar 24, 2016 6:59 PM, "aaren" [email protected] wrote:
Thanks for reporting again.
This is because notedown isn't very clever about parsing markdown and it is parsing item 1b as an indented code block. It does this regardless of the match setting.
With --match=all, item 1b ends up as a code block.
With --match=fenced, item 1b is converted back into a markdown cell
With any other --match value we get the error above. This happens because None is used in place of the block content. This is to do with the parsing difference between fenced and indented blocks. Previous to the merge of #41 https://github.com/aaren/notedown/pull/41 this bit of code was inaccessible to indented blocks.
I've been thinking about switching to a proper parser for a bit but haven't got round to it. I could maybe hack another patch onto the homegrown parser so that it excludes list items from being code blocks.
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/aaren/notedown/issues/43#issuecomment-201065138