codemaid
codemaid copied to clipboard
Characters change into codes on clean
Environment
- Visual Studio version: Visual studio 2017 community
- CodeMaid version: 10.4
- Code language: C#
Description
Greater than and lesser than symbols return as raw char code on cleanup
Steps to recreate
- Use ">" or "<" in comments
- CLean code
Current behavior
Currently the comments come formatted like this "->" after being cleaned. instead of using their character code ">" or "<".
Expected behavior
It should return the normal character. And not the code.
Thanks for reporting the issue. I wasn't able to immediately reproduce it. Can you please provide a code example? Here's the one I tried:
// This comment has < less than and > greater than comments.
Original:
/// <summary> /// Main Function, Takes error -> Prepares it for the format processor -> String gets /// processed by class -> Writes string. /// </summary>
Post Cleanup:
/// <summary> /// Main Function, Takes error -> Prepares it for the format processor -> String gets /// processed by class -> Writes string. /// </summary> /// <param name="Error">The exception object</param>
Thanks for the code sample, I can reproduce it now. It looks like it is affecting XML comments but not standard comments. Did you happen to notice if this started happening with a particular release (I see it was reported with 10.4)?
@willemduncan can you please take a look?
I am aware of this issue, encountered it myself a few times as well. The thing is, it makes sense. The < and > characters in XML obviously have a special meaning and should be escaped. Basically the input XML is malformed. The XML parser is gratuitous enough to overlook it but on writing the reformatted comment it will not write malformed XML.
So much for the cause.
I wouldn't mind deviating from XML standards here, how do you feel about it?
Ahh that makes sense.. thanks for the info! Would this apply to all XML special characters? i.e.
" "
' '
< <
> >
& &
Since we're writing back out something the user originally wrote, I'm fine with leaving it as the user wrote it vs. incidentally fixing it for them to be XML compliant.
First of, it seems " and ' are not affected and simply parsed and written as-is.
For the other three chars, it only affects >, since the other chars (<, &) cause an XML read error, causing the formatter to use plain text handling instead.
If I write back "raw" values, this causes a comment with <xml>&</xml> to be turned into <xml>&</xml> on the first run, and then the next run throws an XML read error.
So what we can do is manually escape only the two special characters causing problems (<, &) before formatting.
- The XML spec says "Element names must start with a letter or underscore", thus we must regex away all occurrences of
<followed by invalid chars. - Escaping
&means that all initially escaped characters become double-escaped on read, and then back to original on writing, eg turning&into&amp;and back.
This feels terribly hacky, but I see no other way short of writing our own XML handling instead or relying on .NET built in classes.
UPDATE: Further testing revealed that this solution causes problems in case of <![CDATA[ ... ]]> elements, so more thinking is required.
I have noticed that this is still happening, a => in an XML comment was changed to =>.
CodeMaid version 12.0.300