language-python icon indicating copy to clipboard operation
language-python copied to clipboard

Raw string format seems to not have grammar support

Open lichaoir opened this issue 11 years ago • 23 comments

When I used r'...' in my code, the colour displayed wrong as if it didn't recognise the raw string format.

lichaoir avatar May 15 '14 01:05 lichaoir

It's ok here. With or without the 'r' the string is highlighted in the same way.

rougeth avatar May 15 '14 03:05 rougeth

Sorry that I missed a key piece of info. It's the escape sequence having wrong highlight. In raw string, escape characters should be displayed as regular strings but they're not.

Thanks

On Thu, May 15, 2014 at 1:26 PM, Marco Rougeth [email protected] wrote:

It's ok here. With or without the 'r' the string is highlighted in the same way.

Reply to this email directly or view it on GitHub: https://github.com/atom/language-python/issues/24#issuecomment-43165014

lichaoir avatar May 15 '14 03:05 lichaoir

Could you show an example?

rougeth avatar May 15 '14 03:05 rougeth

Sure, like below.

On Thu, May 15, 2014 at 1:49 PM, Marco Rougeth [email protected]:

Could you show an example?

— Reply to this email directly or view it on GitHubhttps://github.com/atom/language-python/issues/24#issuecomment-43165864 .

lichaoir avatar May 15 '14 05:05 lichaoir

Sorry, the image might not have attached correctly... here it is.

On Thu, May 15, 2014 at 3:44 PM, Chao Li [email protected] wrote:

Sure, like below.

On Thu, May 15, 2014 at 1:49 PM, Marco Rougeth [email protected]:

Could you show an example?

— Reply to this email directly or view it on GitHubhttps://github.com/atom/language-python/issues/24#issuecomment-43165864 .

lichaoir avatar May 15 '14 05:05 lichaoir

I cannot see it.

rougeth avatar May 16 '14 12:05 rougeth

I can't see the image either, @lichaoir would you mind trying to attach it again to a comment in this issue? Thanks

kevinsawicki avatar May 19 '14 20:05 kevinsawicki

Hi Kevin & Marco,

Don't know why you guys cannot see the attached image. Here is the step to reproduce it: try typing r'\nabc' in atom with default theme. You will see the escape sequence \n is darkened while abc is displayed as plain string. The thing is, since raw string content is treated as-is in python, the \n sequence should also be displayed as plain string, namely not darkened. Does this make sense?

Thanks

On Tue, May 20, 2014 at 6:24 AM, Kevin Sawicki [email protected]:

I can't see the image either, @lichaoir https://github.com/lichaoirwould you mind trying to attach it again to a comment in this issue? Thanks

— Reply to this email directly or view it on GitHubhttps://github.com/atom/language-python/issues/24#issuecomment-43553171 .

lichaoir avatar May 20 '14 01:05 lichaoir

Just so that everyone is clear, I think this is what he is talking about

screen shot 2014-06-06 at 10 01 33 pm

The first line has \n in purple, which instead should be yellow. In a raw string in python, special/escape characters should be treated as any other character i.e., in yellow.

I think I agree with @lichaoir

warunsl avatar Jun 07 '14 05:06 warunsl

As the \n is a special character, I believe it is better to highlight it inside the string.

rougeth avatar Jun 08 '14 12:06 rougeth

@rougeth In this case \n is not special! From the docs.

>>> # Unless an 'r' or 'R' prefix is present, escape sequences in strings are interpreted
>>> r'\n' == '\\n' and r'\\' == '\\\\'
True
>>> # string quotes can be escaped with a backslash, but the backslash remains in the string
>>> r"\'" == r'\'' == "\\'"
True

ThinkChaos avatar Jun 15 '14 16:06 ThinkChaos

As far as I can tell, the package assumes all raw strings are regexes (other regex syntax like []+* etc inside raw strings gets highlighted).

michaelstephendavies avatar Jul 15 '14 11:07 michaelstephendavies

@lichaoir, @warunsl and @ThinkChaos are absolutely correct. The whole point of a raw string is to ignore the "specialness" of escape sequences. For example, I often use raw strings to set Windows-style paths (e.g. r'C:\Users\nmpeterson' and I do not want the \n within that being incorrectly highlighted as a "special" character.

nmpeterson avatar Jul 23 '14 14:07 nmpeterson

Yes, this is a feature of the language grammar. It's trying to be helpful by highlighting raw strings as if they were regexes, since that is typically what they are used for.

aroben avatar Nov 21 '14 21:11 aroben

@aroben That is a good feature for "regular" strings (i.e. u"..." & "...") but it is not appropriate to highlight \n in a raw byte string as a newline.

Note that these two are equivalent single-byte1 strings:

> ord("""
""")
10

> ord("\n")
10

However, this is clearly NOT the same thing:

ord(r"\n")
TypeError: ord() expected a character, but string of length 2 found

We can see that it is actually a 2-byte1 string:

[ord(i) for i in r'\n']
> [92, 110]

So I think this makes it pretty clear that highlighting character sequences in raw strings which would normally be considered "escape sequences" is an error that should be corrected.

1 They're actually more bytes than this (sys.getsizeof(r'\n') vs. sys.getsizeof('\n')), but you get my point.

mattdeboard avatar Apr 24 '15 22:04 mattdeboard

Also, knowing nothing about the grammar mechanisms in Atom (though I'm trying to learn), what's this:

https://github.com/atom/language-python/blob/master/grammars/python.cson#L869-L870

Can this be exposed as a user preference?

mattdeboard avatar Apr 24 '15 22:04 mattdeboard

@mattdeboard You're right that r"\n" is a two-byte (and two-character) string. But I do think it is useful for the grammar to highlight \n specially in this case. In regular expressions, the two-character sequence \n means "a newline character", just like \* means "an asterisk character", etc. Since raw strings are so commonly used for defining regular expressions, the grammar highlights them as such.

aroben avatar Apr 27 '15 13:04 aroben

Ultimately it doesn't even matter, since the broken indentation of this language mode makes Atom unusable for python. This renders any debate about syntax highlighting moot. On Apr 27, 2015 8:45 AM, "Adam Roben" [email protected] wrote:

@mattdeboard https://github.com/mattdeboard You're right that r"\n" is a two-byte (and two-character) string. But I do think it is useful for the grammar to highlight \n specially in this case. In regular expressions, the two-character sequence \n means "a newline character", just like * means "an asterisk character", etc. Since raw strings are so commonly used for defining regular expressions, the grammar highlights them as such.

— Reply to this email directly or view it on GitHub https://github.com/atom/language-python/issues/24#issuecomment-96656900.

mattdeboard avatar Apr 27 '15 13:04 mattdeboard

Doesn't it make more sense to have this package highlight according to the grammar of python instead of guessing what people are using raw strings for? I would vote for changing the syntax highlighting in accordance with this bug report and breaking out the regex highlighting into a separate package which can over-ride the highlighting for language-python if the user so desires.

kbrose avatar Feb 22 '16 20:02 kbrose

@mattdeboard Have you seen https://atom.io/packages/python-indent?

kbrose avatar Mar 18 '16 20:03 kbrose

I see that the reasoning behind this broken highlight is the belief that raw strings are only (or mainly) used for regular expressions.

I'd like to offer evidence to the contrary. Codebase I'm currently working on has lots of strings of the form r"c:\Program Files\Some Program\Bin\Program.exe". As you might guess, these are not regular expressions, but Windows file paths. Currently Atom highlights \P, \S and \B (with different colors too); this leads to confusion, because at the first glance it looks like an indication that Python would recognize the \ symbol as an escape, and that a double backslash \\ is needed instead. But double backslashes -- although colored by Atom like the correct thing -- is wrong, and leads to errors. Errors which are easy to overlook: never a good thing.

Please correct the hightlight by removing it from raw strings.

magv avatar Jul 02 '16 09:07 magv

r''' doc string ''' breaks the grammar in the whole file... any fix?

lsabiao avatar Jun 20 '17 02:06 lsabiao

Would it be these lines that are causing the incorrect highlighting?

What complicates the matter further is that \" is special, in a double quoted raw string, when appearing at the end. In fact, it appears that raw strings have a lot of intricacies:

For example (quite normally):

print r"hello\"wo\rld"
# hello\"wo\rld

But this gives:

print r"Hello\"
# SyntaxError 

And this:

print r"Hello\\", r"Hello\""
# Hello\\ Hello\"

So how does one make a raw string with a single backslash at the end?

P.S. Funnily, even Github's syntax highlighter gets it wrong!

VinayGupta23 avatar Aug 12 '17 08:08 VinayGupta23