language-python Raw string format seems to not have grammar support

When I used r'...' in my code, the colour displayed wrong as if it didn't recognise the raw string format.

May 15 '14 01:05 lichaoir

It's ok here. With or without the 'r' the string is highlighted in the same way.

May 15 '14 03:05 rougeth

Sorry that I missed a key piece of info. It's the escape sequence having wrong highlight. In raw string, escape characters should be displayed as regular strings but they're not.

Thanks

On Thu, May 15, 2014 at 1:26 PM, Marco Rougeth [email protected] wrote:

It's ok here. With or without the 'r' the string is highlighted in the same way.

Reply to this email directly or view it on GitHub: https://github.com/atom/language-python/issues/24#issuecomment-43165014

May 15 '14 03:05 lichaoir

Could you show an example?

May 15 '14 03:05 rougeth

Sure, like below.

On Thu, May 15, 2014 at 1:49 PM, Marco Rougeth [email protected]:

Could you show an example?

— Reply to this email directly or view it on GitHubhttps://github.com/atom/language-python/issues/24#issuecomment-43165864 .

May 15 '14 05:05 lichaoir

Sorry, the image might not have attached correctly... here it is.

On Thu, May 15, 2014 at 3:44 PM, Chao Li [email protected] wrote:

Sure, like below.

On Thu, May 15, 2014 at 1:49 PM, Marco Rougeth [email protected]:

Could you show an example?

— Reply to this email directly or view it on GitHubhttps://github.com/atom/language-python/issues/24#issuecomment-43165864 .

May 15 '14 05:05 lichaoir

I cannot see it.

May 16 '14 12:05 rougeth

I can't see the image either, @lichaoir would you mind trying to attach it again to a comment in this issue? Thanks

May 19 '14 20:05 kevinsawicki

Hi Kevin & Marco,

Don't know why you guys cannot see the attached image. Here is the step to reproduce it: try typing r'\nabc' in atom with default theme. You will see the escape sequence \n is darkened while abc is displayed as plain string. The thing is, since raw string content is treated as-is in python, the \n sequence should also be displayed as plain string, namely not darkened. Does this make sense?

Thanks

On Tue, May 20, 2014 at 6:24 AM, Kevin Sawicki [email protected]:

I can't see the image either, @lichaoir https://github.com/lichaoirwould you mind trying to attach it again to a comment in this issue? Thanks

— Reply to this email directly or view it on GitHubhttps://github.com/atom/language-python/issues/24#issuecomment-43553171 .

May 20 '14 01:05 lichaoir

Just so that everyone is clear, I think this is what he is talking about

screen shot 2014-06-06 at 10 01 33 pm

The first line has \n in purple, which instead should be yellow. In a raw string in python, special/escape characters should be treated as any other character i.e., in yellow.

I think I agree with @lichaoir

Jun 07 '14 05:06 warunsl

As the \n is a special character, I believe it is better to highlight it inside the string.

Jun 08 '14 12:06 rougeth

@rougeth In this case \n is not special! From the docs.

>>> # Unless an 'r' or 'R' prefix is present, escape sequences in strings are interpreted
>>> r'\n' == '\\n' and r'\\' == '\\\\'
True
>>> # string quotes can be escaped with a backslash, but the backslash remains in the string
>>> r"\'" == r'\'' == "\\'"
True

Jun 15 '14 16:06 ThinkChaos

As far as I can tell, the package assumes all raw strings are regexes (other regex syntax like []+* etc inside raw strings gets highlighted).

Jul 15 '14 11:07 michaelstephendavies

@lichaoir, @warunsl and @ThinkChaos are absolutely correct. The whole point of a raw string is to ignore the "specialness" of escape sequences. For example, I often use raw strings to set Windows-style paths (e.g. r'C:\Users\nmpeterson' and I do not want the \n within that being incorrectly highlighted as a "special" character.

Jul 23 '14 14:07 nmpeterson

Yes, this is a feature of the language grammar. It's trying to be helpful by highlighting raw strings as if they were regexes, since that is typically what they are used for.

Nov 21 '14 21:11 aroben

@aroben That is a good feature for "regular" strings (i.e. u"..." & "...") but it is not appropriate to highlight \n in a raw byte string as a newline.

Note that these two are equivalent single-byte¹ strings:

> ord("""
""")
10

> ord("\n")
10

However, this is clearly NOT the same thing:

ord(r"\n")
TypeError: ord() expected a character, but string of length 2 found

We can see that it is actually a 2-byte¹ string:

[ord(i) for i in r'\n']
> [92, 110]

So I think this makes it pretty clear that highlighting character sequences in raw strings which would normally be considered "escape sequences" is an error that should be corrected.

^{1 They're actually more bytes than this (sys.getsizeof(r'\n') vs. sys.getsizeof('\n')), but you get my point.}

Apr 24 '15 22:04 mattdeboard

Also, knowing nothing about the grammar mechanisms in Atom (though I'm trying to learn), what's this:

https://github.com/atom/language-python/blob/master/grammars/python.cson#L869-L870

Can this be exposed as a user preference?

Apr 24 '15 22:04 mattdeboard

@mattdeboard You're right that r"\n" is a two-byte (and two-character) string. But I do think it is useful for the grammar to highlight \n specially in this case. In regular expressions, the two-character sequence \n means "a newline character", just like \* means "an asterisk character", etc. Since raw strings are so commonly used for defining regular expressions, the grammar highlights them as such.

Apr 27 '15 13:04 aroben

Ultimately it doesn't even matter, since the broken indentation of this language mode makes Atom unusable for python. This renders any debate about syntax highlighting moot. On Apr 27, 2015 8:45 AM, "Adam Roben" [email protected] wrote:

@mattdeboard https://github.com/mattdeboard You're right that r"\n" is a two-byte (and two-character) string. But I do think it is useful for the grammar to highlight \n specially in this case. In regular expressions, the two-character sequence \n means "a newline character", just like * means "an asterisk character", etc. Since raw strings are so commonly used for defining regular expressions, the grammar highlights them as such.

— Reply to this email directly or view it on GitHub https://github.com/atom/language-python/issues/24#issuecomment-96656900.

Apr 27 '15 13:04 mattdeboard

Doesn't it make more sense to have this package highlight according to the grammar of python instead of guessing what people are using raw strings for? I would vote for changing the syntax highlighting in accordance with this bug report and breaking out the regex highlighting into a separate package which can over-ride the highlighting for language-python if the user so desires.

Feb 22 '16 20:02 kbrose

@mattdeboard Have you seen https://atom.io/packages/python-indent?

Mar 18 '16 20:03 kbrose

I see that the reasoning behind this broken highlight is the belief that raw strings are only (or mainly) used for regular expressions.

I'd like to offer evidence to the contrary. Codebase I'm currently working on has lots of strings of the form r"c:\Program Files\Some Program\Bin\Program.exe". As you might guess, these are not regular expressions, but Windows file paths. Currently Atom highlights \P, \S and \B (with different colors too); this leads to confusion, because at the first glance it looks like an indication that Python would recognize the \ symbol as an escape, and that a double backslash \\ is needed instead. But double backslashes -- although colored by Atom like the correct thing -- is wrong, and leads to errors. Errors which are easy to overlook: never a good thing.

Please correct the hightlight by removing it from raw strings.

Jul 02 '16 09:07 magv

r''' doc string ''' breaks the grammar in the whole file... any fix?

Jun 20 '17 02:06 lsabiao

Would it be these lines that are causing the incorrect highlighting?

What complicates the matter further is that \" is special, in a double quoted raw string, when appearing at the end. In fact, it appears that raw strings have a lot of intricacies:

For example (quite normally):

print r"hello\"wo\rld"
# hello\"wo\rld

But this gives:

print r"Hello\"
# SyntaxError

And this:

print r"Hello\\", r"Hello\""
# Hello\\ Hello\"

So how does one make a raw string with a single backslash at the end?

P.S. Funnily, even Github's syntax highlighter gets it wrong!

Aug 12 '17 08:08 VinayGupta23