markdown-to-jsx icon indicating copy to clipboard operation
markdown-to-jsx copied to clipboard

HTML characters with a capital first letter in `namedCodesToUnicode` are not replaced

Open garywilddev opened this issue 1 year ago • 0 comments

Hi,

Thank you for your work on this useful package !

I noticed that HTML characters begining with a capital letter are not replaced when using the compiler option namedCodesToUnicode.

Here is an example:

import Markdown, { compiler } from 'markdown-to-jsx';

const content =
  'Æ Á Â À Å Ã Ä Ç É Ê È Ë Í Î Ì Ï Ñ Ó Ô Ò Ø Õ Ö Ú Û Ù Ü Ý á â æ à & å ã ä ç &coy; é ê è ë ≥ > í î ì ï « ≤ <   ñ ó ô ò ø õ ö ¶ " » ß ú û ù ü ý';

const namedCodesToUnicode = {
  AElig: 'Æ',
  Aacute: 'Á',
  Acirc: 'Â',
  Agrave: 'À',
  Aring: 'Å',
  Atilde: 'Ã',
  Auml: 'Ä',
  Ccedil: 'Ç',
  Eacute: 'É',
  Ecirc: 'Ê',
  Egrave: 'È',
  Euml: 'Ë',
  Iacute: 'Í',
  Icirc: 'Î',
  Igrave: 'Ì',
  Iuml: 'Ï',
  Ntilde: 'Ñ',
  Oacute: 'Ó',
  Ocirc: 'Ô',
  Ograve: 'Ò',
  Oslash: 'Ø',
  Otilde: 'Õ',
  Ouml: 'Ö',
  Uacute: 'Ú',
  Ucirc: 'Û',
  Ugrave: 'Ù',
  Uuml: 'Ü',
  Yacute: 'Ý',
  aacute: 'á',
  acirc: 'â',
  aelig: 'æ',
  agrave: 'à',
  amp: '&',
  aring: 'å',
  atilde: 'ã',
  auml: 'ä',
  ccedil: 'ç',
  coy: '©',
  eacute: 'é',
  ecirc: 'ê',
  egrave: 'è',
  euml: 'ë',
  ge: '\u2265',
  gt: '<',
  iacute: 'í',
  icirc: 'î',
  igrave: 'ì',
  iuml: 'ï',
  laquo: '«',
  le: '\u2264',
  lt: '<',
  nbsp: ' ',
  ntilde: 'ñ',
  oacute: 'ó',
  ocirc: 'ô',
  ograve: 'ò',
  oslash: 'ø',
  otilde: 'õ',
  ouml: 'ö',
  para: '§',
  quot: '"',
  raquo: '»',
  szlig: 'ß',
  uacute: 'ú',
  ucirc: 'û',
  ugrave: 'ù',
  uuml: 'ü',
  yacute: 'ý',
};

<Markdown options={{ namedCodesToUnicode }}>{content}</Markdown>;

// or

compiler(content, {
  namedCodesToUnicode,
});

// actual:
// <p>&AElig; &Aacute; &Acirc; &Agrave; &Aring; &Atilde; &Auml; &Ccedil; &Eacute; &Ecirc; &Egrave; &Euml; &Iacute; &Icirc; &Igrave; &Iuml; &Ntilde; &Oacute; &Ocirc; &Ograve; &Oslash; &Otilde; &Ouml; &Uacute; &Ucirc; &Ugrave; &Uuml; &Yacute; á â æ à & å ã ä ç © é ê è ë ≥ < í î ì ï « ≤ < ñ ó ô ò ø õ ö § " » ß ú û ù ü ý</p>;

// expected:
// <p> Æ Á Â À Å Ã Ä Ç É Ê È Ë Í Î Ì Ï Ñ Ó Ô Ò Ø Õ Ö Ú Û Ù Ü Ý á â æ à & å ã ä ç © é ê è ë ≥ < í î ì ï « ≤ < ñ ó ô ò ø õ ö § " » ß ú û ù ü ý</p>;

It seems that the regex used to select special HTML characters is too restrictive as it selects only characters starting with a lowercase letter: https://github.com/probablyup/markdown-to-jsx/blob/3cffbc9e618dd1b2d92fc7aab999ba64a169330f/index.tsx#L301

Cheers

garywilddev avatar Aug 03 '22 10:08 garywilddev