problem-specifications
problem-specifications copied to clipboard
markdown: add test for multiple lists
I was just going to suggest adding an additional test to ensure that the code handles multiple lists properly. In my solution to the problem, I used a regex to insert the <ul> tags at the appropriate locations. Originally I had just (<li>.*</li>) which passed all the tests, but which will work ONLY WHEN YOU HAVE A SINGLE LIST in your markdown text. If you have more than one, it'll put a single pair of <ul> tags around all of them (beginning of the first, end of the last) and whatever headers and paragraphs are found between.
My current regex (<li>.*?</li>)(<[ph])|(<li>.*</li>) handles multiple lists correctly, as far as I can tell. The first half (before the '|') catches string-initial or internal lists, while the latter half will kick in to handle string-terminal lists.
I don't understand why you're parsing html (with regex!) at all. Aren't you inserting the opening ul tag when you start a list?
I don't know the exercise, but as a reference [this is example.py] file in the repo: https://github.com/exercism/python/blob/master/exercises/markdown/example.py#L52-L88
I'm going to move this to the python repo for discussion there.
@floydbj As @glennj indicated, parsing HTML into markdown is not the goal of the exercise. The exercise goal is to parse markdown into HTML. I'm not sure I understand what you're trying to accomplish here.
Sorry, I should have been clearer. I first used re.sub in multiline mode to place the <p>, <h#>, and <li> tags, then removed all newline characters, then used re.sub to insert the <ul> tags around the list items.
As you've discovered, that's not a reliable way to process the file. I'd recommend adding the start/end ul tags when needed while you're detecting list items.
I have also completed the exercise as you suggest, using a loop with an in_list flag to place the <ul> tags, but completed it via this approach for a more concise solution. As for reliability, that comes down to whether you write the regular expressions correctly or not--which of course is not necessarily trivial to do. I realize that very few to none will choose to go down the same path, but nevertheless, the test file as it is currently has a tiny blind spot, which I wanted to share just in case.
The test I propose adding is along the lines of the following:
def test_handles_multiple_lists_properly(self):
self.assertEqual(
parse("* List 1 Item 1\n* List 1 Item 2\nThis is a paragraph\n* List 2 Item 1\n* List 2 Item 2\nThis is another paragraph\n* List 3 Item 1\n* List 3 Item 2"),
"<ul><li>List 1 Item 1</li><li>List 1 Item 2</li></ul><p>This is a paragraph</p><ul><li>List 2 Item 1</li><li>List 2 Item 2</li></ul><p>This is another paragraph</p><ul><li>List 3 Item 1</li><li>List 3 Item 2</li></ul>",
)
This should ensure that the code handles initial, internal, and terminal lists properly, when they are presented together.
I don't have strong feelings one way or the other on adding the suggested test, but the tests in that exercise (and most others) are inherited from the central problem-specifications repository. I would be interested to hear feedback from the maintainers over there before adding another test here.
@iHiD , I know Python was mentioned in the title, but this is really a problem-specifications issue. Would you mind moving it over there?
@floydbj , if your suggested test is added in the spec, it will find its way to the Python track as well.
@floydbj Are you still interested in working on this issue?