problem-specifications icon indicating copy to clipboard operation
problem-specifications copied to clipboard

markdown: add test for multiple lists

Open floydbj opened this issue 5 years ago • 8 comments

I was just going to suggest adding an additional test to ensure that the code handles multiple lists properly. In my solution to the problem, I used a regex to insert the <ul> tags at the appropriate locations. Originally I had just (<li>.*</li>) which passed all the tests, but which will work ONLY WHEN YOU HAVE A SINGLE LIST in your markdown text. If you have more than one, it'll put a single pair of <ul> tags around all of them (beginning of the first, end of the last) and whatever headers and paragraphs are found between.

My current regex (<li>.*?</li>)(<[ph])|(<li>.*</li>) handles multiple lists correctly, as far as I can tell. The first half (before the '|') catches string-initial or internal lists, while the latter half will kick in to handle string-terminal lists.

floydbj avatar Jul 12 '20 21:07 floydbj

I don't understand why you're parsing html (with regex!) at all. Aren't you inserting the opening ul tag when you start a list?

glennj avatar Jul 13 '20 15:07 glennj

I don't know the exercise, but as a reference [this is example.py] file in the repo: https://github.com/exercism/python/blob/master/exercises/markdown/example.py#L52-L88

I'm going to move this to the python repo for discussion there.

iHiD avatar Jul 13 '20 16:07 iHiD

@floydbj As @glennj indicated, parsing HTML into markdown is not the goal of the exercise. The exercise goal is to parse markdown into HTML. I'm not sure I understand what you're trying to accomplish here.

cmccandless avatar Jul 14 '20 17:07 cmccandless

Sorry, I should have been clearer. I first used re.sub in multiline mode to place the <p>, <h#>, and <li> tags, then removed all newline characters, then used re.sub to insert the <ul> tags around the list items.

floydbj avatar Jul 14 '20 18:07 floydbj

As you've discovered, that's not a reliable way to process the file. I'd recommend adding the start/end ul tags when needed while you're detecting list items.

glennj avatar Jul 14 '20 19:07 glennj

I have also completed the exercise as you suggest, using a loop with an in_list flag to place the <ul> tags, but completed it via this approach for a more concise solution. As for reliability, that comes down to whether you write the regular expressions correctly or not--which of course is not necessarily trivial to do. I realize that very few to none will choose to go down the same path, but nevertheless, the test file as it is currently has a tiny blind spot, which I wanted to share just in case.

The test I propose adding is along the lines of the following:

def test_handles_multiple_lists_properly(self):
        self.assertEqual(
            parse("* List 1 Item 1\n* List 1 Item 2\nThis is a paragraph\n* List 2 Item 1\n* List 2 Item 2\nThis is another paragraph\n* List 3 Item 1\n* List 3 Item 2"), 
            "<ul><li>List 1 Item 1</li><li>List 1 Item 2</li></ul><p>This is a paragraph</p><ul><li>List 2 Item 1</li><li>List 2 Item 2</li></ul><p>This is another paragraph</p><ul><li>List 3 Item 1</li><li>List 3 Item 2</li></ul>",
        )

This should ensure that the code handles initial, internal, and terminal lists properly, when they are presented together.

floydbj avatar Jul 14 '20 20:07 floydbj

I don't have strong feelings one way or the other on adding the suggested test, but the tests in that exercise (and most others) are inherited from the central problem-specifications repository. I would be interested to hear feedback from the maintainers over there before adding another test here.

@iHiD , I know Python was mentioned in the title, but this is really a problem-specifications issue. Would you mind moving it over there?

@floydbj , if your suggested test is added in the spec, it will find its way to the Python track as well.

cmccandless avatar Jul 15 '20 13:07 cmccandless

@floydbj Are you still interested in working on this issue?

ErikSchierboom avatar Jan 19 '22 15:01 ErikSchierboom