scc icon indicating copy to clipboard operation
scc copied to clipboard

Python counting is wrong with "" or ''

Open geohot opened this issue 2 years ago • 1 comments

Describe the bug Python parser gets stuck in string state if it sees '' or ""

To Reproduce

Create tmp.py with this content and run scc tmp.py

a = ''

# comment
for i in range(10):
  print(i)

Expected behavior

This is the (wrong) output

───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines   Blanks  Comments     Code Complexity
───────────────────────────────────────────────────────────────────────────────
Python                       1         6        0         0        6          0
───────────────────────────────────────────────────────────────────────────────
Total                        1         6        0         0        6          0
───────────────────────────────────────────────────────────────────────────────

You get this if you change '' to str()

───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines   Blanks  Comments     Code Complexity
───────────────────────────────────────────────────────────────────────────────
Python                       1         6        2         1        3          1
───────────────────────────────────────────────────────────────────────────────
Total                        1         6        2         1        3          1
───────────────────────────────────────────────────────────────────────────────

Desktop (please complete the following information): Mac OS, tried version in brew and master

geohot avatar Mar 01 '23 15:03 geohot

That's an interesting one!

boyter avatar Mar 02 '23 23:03 boyter

@boyter The reason for this bug is very interesting.

The current Trie will try to match the longest prefix. Suppose we have a single quote and three single quotes as two different prefixes, then in theory '' should match the prefix '.

However, because the Trie will try to match the longest prefix, it will check whether '' can be matched. Unfortunately, because of the existence of the prefix ''', when the Trie tries to match the second single quote, it successed. There are no more chars to match, so the Trie will return the node of '' as the result. But, the node of '' is a part of the node of ''', it is not closed and does not have a nodeType (only the default 0), this makes all the things after a = '' will be treated as strings (the first single quote has unknown type and was skipped, then the second one will be treated as the beginning of a string) and breaks the code stats.

To fix this issue, we can record a previous closed node, when we reach the end of matching, we check if the node is closed. If it is we return the current node otherwise we return the previous closed node (If there is no previous closed node, just return the current node).

I'll submit a patch later and add some tests to prevent regressions.

If there's a better way, be free to point it out here or in the pull request.

apocelipes avatar Jul 22 '24 11:07 apocelipes

Thanks for looking into this. Its one of those ones I had been meaning to get around to but never did. PR merged.

boyter avatar Jul 22 '24 22:07 boyter