codemaid icon indicating copy to clipboard operation
codemaid copied to clipboard

Characters change into codes on clean

Open TylerBurnett opened this issue 7 years ago • 7 comments

Environment

  • Visual Studio version: Visual studio 2017 community
  • CodeMaid version: 10.4
  • Code language: C#

Description

Greater than and lesser than symbols return as raw char code on cleanup

Steps to recreate

  1. Use ">" or "<" in comments
  2. CLean code

Current behavior

Currently the comments come formatted like this "->" after being cleaned. instead of using their character code ">" or "<".

Expected behavior

It should return the normal character. And not the code.

TylerBurnett avatar Jun 08 '18 00:06 TylerBurnett

Thanks for reporting the issue. I wasn't able to immediately reproduce it. Can you please provide a code example? Here's the one I tried:

// This comment has < less than and > greater than comments.

codecadwallader avatar Jun 09 '18 19:06 codecadwallader

Original: /// <summary> /// Main Function, Takes error -> Prepares it for the format processor -> String gets /// processed by class -> Writes string. /// </summary>

Post Cleanup: /// <summary> /// Main Function, Takes error -&gt; Prepares it for the format processor -&gt; String gets /// processed by class -&gt; Writes string. /// </summary> /// <param name="Error">The exception object</param>

TylerBurnett avatar Jun 13 '18 00:06 TylerBurnett

Thanks for the code sample, I can reproduce it now. It looks like it is affecting XML comments but not standard comments. Did you happen to notice if this started happening with a particular release (I see it was reported with 10.4)?

@willemduncan can you please take a look?

codecadwallader avatar Jun 15 '18 10:06 codecadwallader

I am aware of this issue, encountered it myself a few times as well. The thing is, it makes sense. The < and > characters in XML obviously have a special meaning and should be escaped. Basically the input XML is malformed. The XML parser is gratuitous enough to overlook it but on writing the reformatted comment it will not write malformed XML.

So much for the cause.

I wouldn't mind deviating from XML standards here, how do you feel about it?

w5l avatar Feb 11 '19 07:02 w5l

Ahh that makes sense.. thanks for the info! Would this apply to all XML special characters? i.e.

"   &quot;
'   &apos;
<   &lt;
>   &gt;
&   &amp;

Since we're writing back out something the user originally wrote, I'm fine with leaving it as the user wrote it vs. incidentally fixing it for them to be XML compliant.

codecadwallader avatar Feb 16 '19 10:02 codecadwallader

First of, it seems " and ' are not affected and simply parsed and written as-is.

For the other three chars, it only affects >, since the other chars (<, &) cause an XML read error, causing the formatter to use plain text handling instead.

If I write back "raw" values, this causes a comment with <xml>&amp;</xml> to be turned into <xml>&</xml> on the first run, and then the next run throws an XML read error.

So what we can do is manually escape only the two special characters causing problems (<, &) before formatting.

  • The XML spec says "Element names must start with a letter or underscore", thus we must regex away all occurrences of < followed by invalid chars.
  • Escaping & means that all initially escaped characters become double-escaped on read, and then back to original on writing, eg turning &amp; into &amp;amp; and back.

This feels terribly hacky, but I see no other way short of writing our own XML handling instead or relying on .NET built in classes.

UPDATE: Further testing revealed that this solution causes problems in case of <![CDATA[ ... ]]> elements, so more thinking is required.

w5l avatar Apr 25 '19 07:04 w5l

I have noticed that this is still happening, a => in an XML comment was changed to =&gt;. CodeMaid version 12.0.300

twojnarowski avatar Apr 25 '24 16:04 twojnarowski