csharpstandard icon indicating copy to clipboard operation
csharpstandard copied to clipboard

#if preprocessor directive incorrectly handles enclosed C# verbatim string

Open gafter opened this issue 6 years ago • 23 comments

@gdivis commented on Fri Aug 21 2015

To reproduce, just use this code and make sure that DEBUG is set:

#if !DEBUG
    [Description(@"
    # text")]
    class Test
    {
    }
#endif

Actually, this produces a CS1024 error because the compiler seems to interpret the "# text" portion of the string as a preprocessor directive.


@CyrusNajmabadi commented on Fri Aug 21 2015

I'm pretty sure this is by design. Once the compiler sees the #if !DEBUG it's going to be in an inactive region. Once in that state it will continue scanning through the inactive region handling other preprocessor directives, until it finally ends that region (with the #endif). Inside that region, the compiler has no concept of things a multi-line string literal.


@gafter commented on Sun Aug 23 2015

We should investigate whether this is a bug or per spec.


@CyrusNajmabadi commented on Sun Aug 23 2015

From my reading, this is 'per spec'. The relevant sections are:

pp-if-section:
    whitespaceopt  #  whitespaceopt  if  whitespace  pp-expression  pp-new-line  conditional-sectionopt

conditional-section:
    input-section
    skipped-section

Because 'DEBUG' is defined, then the pp-expression evaluates to false, meaning we'll be in a skipped-section.

The spec then says:

The remaining conditional-sections, if any, are processed as skipped-sections: except for pre-processing directives, the source code in the section need not adhere to the lexical grammar; no tokens are generated from the source code in the section; and pre-processing directives in the section must be lexically correct but are not otherwise processed.

As such, it is 'per spec' that an error is produced as the compiler is checking that the #text "pre-processing directives in the section must be lexically correct". As #text is not a legal pp-directive, the error is appropriate.


@gdivis commented on Mon Aug 24 2015

Fair enough; per the spec I agree that this can't be considered a bug. Nonetheless it is rather unintuitive behavior - requiring pre-processing directives to be "lexically correct" while declaring that they are unprocessed goes contrary to what one would expect - that everything is skipped until the #endif directive is encountered at the beginning of a line. Sounds like my problem is actually with the spec and not the implementation :)

All that being said, this was a rather odd scenario that popped up and was easy to work around. Thanks for looking into it.


@gafter commented on Mon Aug 24 2015

When commenting out a block of code, I recommend using line-ending comments. In VS, select the code to be commented out and hit control-K control-C. To uncomment a block of code select it and hit control-K control-U.

I acknowledge that this doesn't help in your scenario, where you want the code commented or not depending on a PP-symbol.


@gdivis commented on Mon Aug 24 2015

Agreed on the line-ending comments; I'm a big fan of ctrl-K,C. This was literally a case of production code having a C# verbatim string inside a skipped block with a line that happened to begin with #. I can't imagine that this is a very common use case, but I wanted to at least report this behavior in case it wasn't intended. Again, thanks for taking the time to check this out and explain it to me.


@CyrusNajmabadi commented on Mon Aug 24 2015

I think it's so you don't end up accident writing something like:

#els

in the middle of your skipped section. With the current spec approach, you'll get a nice error here and you can fix the directive.

-- Cyrus

From: gdivismailto:[email protected] Sent: ‎8/‎24/‎2015 10:21 AM To: dotnet/roslynmailto:[email protected] Cc: Cyrus Najmabadimailto:[email protected] Subject: Re: [roslyn] #if preprocessor directive incorrectly handles enclosed C# verbatim string (#4726)

Fair enough; per the spec I agree that this can't be considered a bug. Nonetheless it is rather unintuitive behavior - requiring pre-processing directives to be "lexically correct" while declaring that they are unprocessed goes contrary to what one would expect - that everything is skipped until the #endif directive is encountered at the beginning of a line. Sounds like my problem is actually with the spec and not the implementation :)

All that being said, this was a rather odd scenario that popped up and was easy to work around. Thanks for looking into it.

— Reply to this email directly or view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fdotnet%2froslyn%2fissues%2f4726%23issuecomment-134306795&data=01%7c01%7ccyrusn%40microsoft.com%7c3193db7629df45631f8708d2aca87b02%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=WtqC2fjQ9vwmrBR%2fahpVbLhdxiJ%2fGUWeS1eqW%2fANQ9w%3d.


@diryboy commented on Tue Aug 25 2015

This is similar to

/*
var s = "*/";
*/

Ops


@NickCraver commented on Mon Nov 02 2015

Can we open this back up to see if it should be "by-design" (on the spec side)? This bit us when building Stack Overflow when a debug section had some markdown inside a string. I don't really think markdown in a string is insanely uncommon - I'd imagine quite a few people will hit this over time for that and other use cases...and it's not at all intuitive as to why Roslyn suddenly breaks here.


@CyrusNajmabadi commented on Mon Nov 02 2015

@NickCraver

and it's not at all intuitive as to why Roslyn suddenly breaks here.

Could you explain what you mean by this? It sounds like you're saying one of two things here:

  1. It's not intuitive why Roslyn breaks versus the pre-Roslyn compiler.
  2. It's not intuitive why C# has this behavior at all.

If the former, this should not be the case. We should have the same behavior as the pre-Roslyn compiler.

If the latter... well, it's in the eye of the beholder. It's intuitive for me. The compiler is in preprocessor-translation mode. And it sees an invalid preprocessor directive. So it dies. Makes sense to me :)


@svick commented on Tue Nov 03 2015

@CyrusNajmabadi I'm not sure "is intuitive to compiler implementer" is the right goal. I think that normal developers are not going to think like that.

To me the intuitive behavior is "surrounding any piece of code with #if will just work". Maybe it's not possible to reach that (e.g. what if there's #endif inside the string literal?), but getting closer sounds like a worthwhile goal to me.


@NickCraver commented on Tue Nov 03 2015

Sorry, busy days here lately :) It's definitely the latter - surrounding a chunk of code with #if <directive>/#endif for example should just work. Most developers won't understand why this broke without digging a little.

This gets further complicated in our case because we do compile-time replacement of strings for localization. That wasn't the case hit here (it was simple markdown), but the fact that a string our translators give us can just break builds one day is a little scary. I simply think we can do better here.


@CyrusNajmabadi commented on Tue Nov 03 2015

It would be unintuitive to me to try to understand why a #elseif was not working. Today the compiler will helpfully tell me that #elseif is wrong (and I should use #elif instead).

Either way has pros and cons. The current approach has the benefit of telling when I do things wrong, but impeding you if you use pp directives in multi-line string literals. The approach you suggest allows for multi-line string literals, but means I don't get good feedback about when i'm doing things wrong.


@gafter commented on Tue Nov 03 2015

@NickCraver Do you have a specific proposal for changes to the language specification? Absent a proposed alternative, I don't see what the value would be in reopening the issue. The current spec and all implementations agree, and we would not be likely to take a change that could break existing programs.


@NickCraver commented on Tue Nov 03 2015

@CyrusNajmabadi Indeed, that's not the error we're talking about here though. Any multi-line string that has a line starting with # will result in this:

error CS1024: Preprocessor directive expected

That's far less intuitive than information errors from actually using a directive incorrectly. The surface area for hitting such a thing is much larger. In our case it's markdown, for others it may be a PowerShell comment, etc. I'm not sure how excluding multi-line strings (which admittedly may be very non-trivial) would prevent your case of good informational errors with misused directives. I think the problem here is identifying what is a directive, a step before that.


@diryboy commented on Tue Nov 03 2015

If verbatim strings are accounted for, what about normal and interpolated strings? What about multiline comments?

It isn't uncommon to have s = "@"; or other kinds of commented codes that confuses the "degraded parser" which only account for multiline strings right?


@agocke commented on Tue Nov 03 2015

@NickCraver @diryboy Let's be clear here -- there is no source code inside inactive #if directives. Nothing is parsed, compiled, analyzed -- anything. Excluding strings inside directives has no meaning because there's no such thing as a string, just a block of text that may or may not contain preprocessor directives.

I think any proposal needs to start from there and any assumptions it implies, including that people could have entire files containing inactive #if blocks consisting of nothing but kilobytes of line noise.


@paul1956 commented on Tue Nov 03 2015

VB seems to require the # start in the first column (at least by default it auto formats it that way removing all the whitespace), but then as @agocke says the false clause is just a blob of text. What seems confusing to me is if I change True to False then the error appears at the #Else instead of the #End If. Since you are forced to have the # as the first character of the line why not change the grammar to require it and then most/all of the issues with # appearing in strings goes away except when the # is the first character. Alternatively there should be some kind of escape for # that allows it to be ignored when on the false side and treated as a # when on the true side. If I have to edit one side or the other every time the condition changes the preprocessor directive is less useful. There could then be a Analyzer and Fix to detect and fix these corner cases.

#If TRUE Then
        Dim x As String = "
         #End If"
#Else
        Dim x As String = "
#End If"
#End If ' Error here because previous part of line is taken as a Preprocessor string 

@gafter commented on Tue Nov 03 2015

If the # were "ignored" in the false side, how would the false side ever end?


@agocke commented on Tue Nov 03 2015

Since you are forced to have the # as the first character of the line

C# Spec Section 2.5:

White space may occur before the # character and between the # character and the directive name.


@paul1956 commented on Tue Nov 03 2015

@gagter I was proposing the use of an escape character before the # so that it could be ignored in the cases sited.

@agocke I was talking about VB which has the same issues and Visual Studio removes any leading white space so it ignored the spec assuming it shares that grammar with C#.


@agocke commented on Tue Nov 03 2015

@paul1956 Pick any escape character you want that's not whitespace. As long as there is a non-whitespace character before # it isn't regarded as the start of a directive.


@paul1956 commented on Tue Nov 03 2015

@agocke how does that help. The compiler doesn't understand it. I would have to remove it every time the condition changed. In the True clause I need # and in the False clause I need escape# for everything to work correctly in both clauses. When the condition changes I need to move the Escape so it is always in the False clause only.


@agocke commented on Tue Nov 03 2015

@paul1956 In the True case no escape could possibly help because the example given is nested inside a verbatim string literal (where the entire point is not having to escape things).


@paul1956 commented on Tue Nov 03 2015

@agocke I am not trying to engineer a solution just saying the is a real issue that should not be so easily dismissed.


@AraHaan commented on Fri Dec 29 2017

I got this issue on the C# Compiler in VS2017 when debug is not set too.

Also it can also be an python style comment on the string (where the string embeds python code like pytocs does to test on debug only)

https://github.com/uxmal/pytocs

gafter avatar Dec 29 '17 22:12 gafter