content Too many clicks to obtain HTML char entity names

MDN URL

https://developer.mozilla.org/en-US/docs/Glossary/Entity

What specific section or headline is this issue about?

Must click three times to get '

What information was incorrect, unhelpful, or incomplete?

Suggest that all HTML entities be folded into this page.

What did you expect to see?

A complete reference to HTML entities, at least the first 256.

Do you have any supporting links, references, or citations?

No response

Do you have anything more you want to share?

No response

MDN metadata

Page report details

Folder: en-us/glossary/entity
MDN URL: https://developer.mozilla.org/en-US/docs/Glossary/Entity
GitHub URL: https://github.com/mdn/content/blob/main/files/en-us/glossary/entity/index.md
Last commit: https://github.com/mdn/content/commit/01489233cf524a628763172fed11dcc2e565a4e0
Document last modified: 2024-06-08T11:08:00.000Z

Jun 24 '24 18:06 gggustafson

What do you mean exactly by "three clicks"? https://developer.mozilla.org/en-US/docs/Glossary/Character_reference is the canonical reference for all common character references, and we don't want to maintain a list beyond that size because you can just use the spec instead.

We should probably update all links that point to https://developer.mozilla.org/en-US/docs/Glossary/Entity to point to https://developer.mozilla.org/en-US/docs/Glossary/Character_reference instead. Is that what you are asking for?

Jun 24 '24 18:06 Josh-Cena

and we don't want to maintain a list beyond that size because you can just use the spec instead.

@gggustafson More precisely, experience on MDN shows that we either need to have complete lists or lists that are cut back so much that it is obvious they are partial (otherwise people assume our lists are complete). I chose not to do the whole list because there are thousands of items and the important ones to my mind are those that either you use most often such as the <,>, or you can't type (such as &nbsp).

We should probably update all links that point to https://developer.mozilla.org/en-US/docs/Glossary/Entity to point to https://developer.mozilla.org/en-US/docs/Glossary/Character_reference instead. Is that what you are asking for?

That's a good idea - unless we actually mean "entity", but in most cases the term is imprecise terminology for character reference.

Jun 25 '24 00:06 hamishwillee

Fixed by #34391.

Hope this satisfies because duplicating the list on MDN is not a good idea.

Jun 25 '24 01:06 hamishwillee

@hamishwillee I think we should still add ' to this list, considering that " is. In fact as @gggustafson said, we should probably add everything that's below U+0080, because they are likely to come up a lot.

Jun 25 '24 02:06 Josh-Cena

@Josh-Cena As per my comment above, the list is deliberately reduced to make it very clear that it is not complete. If we start adding items, then the question is "why didn't you add the one I wanted".

There is a complete list, we direct to that.

If you want though, happy to remove "?

Jun 25 '24 03:06 hamishwillee

As I said: I think "ASCII characters" is a good boundary, and above that we can only show those that "seem to be common" (which means, whatever we have there already). I think ' is particularly useful, because if you use single quotes in your HTML attribute, then you have to use ' in your value. Up to you though.

Jun 25 '24 03:06 Josh-Cena

Thanks. If it's up to me then I'm not adding them.

Jun 25 '24 04:06 hamishwillee

I believe that this discussion is being driven from an MDN perspective rather than from a reader's perspective.

I am writing a tool that converts XML comments into HTML pages. During that conversion, I am converting the XML comment tag <code> to the HTML tag <pre>. Having authored many articles on Code Project, I am aware that certain characters that might appear within a <pre>...</pre> pair must be changed to their HTML entity representation. I normally replace &, <, >, ", and ' with their respective HTML entities &, <, >, ", and '. I, personally, do that naturally. To confirm, I read the MDN documentation on <pre>.

Now here is the problem: to a reader unfamiliar with the limitation of the <pre> tag, the discussion is somewhat above the level of beginner understanding.

If you have to display reserved characters such as <, >, &, and " within the <pre> tag, the characters must be 
escaped using their respective [HTML entity](https://developer.mozilla.org/en-US/docs/Glossary/Entity).

To the uninitiated, all that the MDN documentation needed was

If you have to display reserved characters such as &, <, >, ", and ' within the <pre> tag, the characters must be 
escaped using their respective HTML entities &amp;, &lt;, &gt;, &quot;, and &apos;.

MDN may wish to caution that, if the replacement is performed programmatically, then the replacement of & must occur before the others.

Jun 25 '24 16:06 gggustafson

I also agree, and I think you could have included that in the issue body as that's the "real" issue to fix. I have done something similar in https://developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe#embedding_source_code_in_an_iframe:

First, write the HTML out, escaping anything you would escape in a normal HTML document (such as <, >, &, etc.).

< and < represent the exact same character in the srcdoc attribute. Therefore, to make it an actual escape sequence in the HTML document, replace any ampersands (&) with &. For example, < becomes &lt;, and & becomes &amp;.

Replace any double quotes (") with " to prevent the srcdoc attribute from being prematurely terminated (if you use ' instead, then you should replace ' with ' instead). This step happens after the previous one, so " generated in this step doesn't become &quot;.

Jun 25 '24 16:06 Josh-Cena

OK, so I fixed the issue as originally stated. I've also added ' (') to the list in https://github.com/mdn/content/pull/34473 as this is a reserved character in some contexts. Looking at the spec closely & does not appear to be reserved in any context - though you can of course escape it when used in a character reference by accident, or create an ambiguous reference.

With all that in mind, I think the docs in https://developer.mozilla.org/en-US/docs/Web/HTML/Element/pre are probably wrong where it says:

If you have to display reserved characters such as <, >, &, and " within the <pre> tag, the characters must be escaped using their respective HTML entity.

Firstly & is not reserved and " and ' work very nicely. So do < and > unless you create a valid tag as defined in the spec. So it is not so much that you can't include these characters, it is that tags are allowed inside <pre> as part of the content, so if you want to display the tags you need to escape them. You'd want to take care doing this as an automated process, because you can't know if the intent is to have the tag "used" or displayed. Does that make sense? If so, I'd say its a new issue.

With respect to the topic of having a comment about automatic replacement of reserved characters, not sure where this should go. Ideally it would go in the character reference glossary as a "write once, use from many places" kind of thing. But as for the <pre> case and for srcdoc, what you can safely do in terms of replacement depends on the context. So probably safer to make those kinds of notes in the respective contexts.

Jun 28 '24 02:06 hamishwillee

What is included in <pre>...</pre> pair is not normally tags but rather C# (or whatever language) code like:

     using BM = Utilities.BoyerMoore;
       
       BM              bm;
       List < int >    positions;
       StringBuilder   target = new StringBuilder ( );
       
       target.AppendFormat ( "<{0}>", XML_tag );
       bm = new BM ( target.ToString ( ) );
       positions = bm.find ( XML_comment.text, 0, true );

Also we are dealing with myriad browsers. So " (quotation mark not double quote that would look like "") and ' (apostrophe) need to be replaced. Regarding ', if it is not replaced first, then later apostrophes as in < will be replaced, resulting in the atrocity 'lt;.

I mentioned automatic replacement only for completeness. I suggest that authors do their own replacement manually.

Jun 28 '24 04:06 gggustafson

Thanks, I do understand the issue - it is fitting it into the MDN reference that has some challenges. I think that what needs to happen here is an update to <pre>, in particular https://developer.mozilla.org/en-US/docs/Web/HTML/Element/pre#escaping_reserved_characters , explaining the strategies.

I'd also fix up the text there to remove & from the list of reserved characters.

But that is enough of a separate issue perhaps we could create a new one and close this? Main reason is that I might not review that, though I chose to review this one.

Closing this. Do you want me to create that new issue, or will you?

Jul 01 '24 04:07 hamishwillee

I am not sure that removing & is a good idea. By the way, IMO we are not addressing "reserved characters". Rather we are talking about HTML recognized characters. When HTML encounters an & it is anticipating some form of HTML entity. I have had complaints from browsers that seem to require an HTML entity following an &. To be safe, I'd suggest leaving & in.

Jul 01 '24 12:07 gggustafson

I am not sure that removing & is a good idea. By the way, IMO we are not addressing "reserved characters". Rather we are talking about HTML recognized characters. When HTML encounters an & it is anticipating some form of HTML entity. I have had complaints from browsers that seem to require an HTML entity following an &. To be safe, I'd suggest leaving & in.

MDN documents the spec in human readable form, in particular in reference docs. The spec (as I recall) doesn't even call < and > reserved - it's a specific sequence that makes a tag that is problematic. Ditto for ' and " - they can be fine in some contexts and not in others. I take your point though, if this is going to be a problem for readers we should leave that stuff in, but perhaps use softer wording like "HTML recognized" or "problematic" and then use terminology to explain in a little bit more detail.

I've created #34545 to allow further discussion.

Jul 01 '24 23:07 hamishwillee

content content copied to clipboard

Too many clicks to obtain HTML char entity names

MDN URL

What specific section or headline is this issue about?

What information was incorrect, unhelpful, or incomplete?

What did you expect to see?

Do you have any supporting links, references, or citations?

Do you have anything more you want to share?

MDN metadata

content
content copied to clipboard