language-python icon indicating copy to clipboard operation
language-python copied to clipboard

Incorrect docstring handling

Open graingert opened this issue 9 years ago • 6 comments

The syntax highlighting for https://github.com/farcepest/MySQLdb1/blob/9b0b59f1bbe0029757b9b15d1c71586eaca1109c/MySQLdb/cursors.py

Seems to be incorrect, it marks most of the file as a string.

graingert avatar Feb 11 '15 14:02 graingert

See also e.g. https://github.com/cobrateam/splinter/blob/master/splinter/browser.py#L46. Everything between the """s should be quoted.

erbridge avatar Mar 18 '15 12:03 erbridge

Seems to be that the issue relates to the following corner case:

  • raw string prefix
  • triple-quoted string
  • containing backslashes
  • related to brackets in the string

Here are three instances of the bug that result in slightly different syntax:

Break 1:

before()
r'''(\)'''
after()

Break 2:

before()
r'''(\'''
after()

Break 3:

before()
r'''[\'''
after()

The last example shows different highlighting for after() -- I think in the original case it's not that the docstring fails to end, but that the active "highlighting mode" is affected by characters in the string somehow.

Some working examples:

Works 1:

before()
r'(\)'
after()

Works 2:

before()
r'''\''''
after()

Works 3:

before()
r'''\]'''
after()

alexjurkiewicz avatar Apr 13 '15 03:04 alexjurkiewicz

Looks like the grammar is possibly trying to greedily highlight regex and hence not bailing when the regex is invalid but the string is closed.

This doesn't quite explain what's going on in the MySQLdb1 file linked in the OP though, though that might just be the regex matching gone wrong as well.

auscompgeek avatar Sep 13 '15 12:09 auscompgeek

@alexjurkiewicz actually, after testing your snippets in a Python REPL, your second snippet is semi-intended behaviour. Python considers r'''(\''' to be an unclosed string, although it seems to be stuck in regex mode after the open parenthesis:

r'''(\'''
'''
hello()

Similar issue here, except we're dealing with regex classes:

r'''[\'''
'''
hello()

This works as expected/intended:

r'''(\'''
)'''
test()

r'''[\'''
]'''
test()

auscompgeek avatar Oct 04 '15 00:10 auscompgeek

Guess this is just #39 then?

auscompgeek avatar Oct 04 '15 00:10 auscompgeek

yes, confusing!

alexjurkiewicz avatar Oct 06 '15 03:10 alexjurkiewicz