Syntax highlighting doesn't work well on files with `.tsx` extension
Describe the bug
When opening a pull request or when the code is merged into the main branch, the syntax highlighting for .tsx TypeScript React files doesn't work properly, leaving part of the file in plain text.
Expected behaviour
The expected behavior should be to always display the syntax highlighting that corresponds to the .tsx extension.
Additional notes
As additional information, I have managed to correct this by importing React at the beginning of each file.
import * as React from 'react';
But this should not be necessary today, since it is not an obligation to have to import React.
This will be because the .tsx extension is shared with XML and the heuristic to differentiate the two is looking for react:
https://github.com/github-linguist/linguist/blob/f0aebbe90d3b9b9bae5b7258f99370def1ee489f/lib/linguist/heuristics.yml#L781-L786
If the file doesn't match that or the XML heuristic, things fall through to the classifier which uses the samples we have to guess and all but one of the samples have React in them.
There are two solutions:
- Improve the heuristic or
- Add more samples which don't have React in them to better train the classifier.
The former is more reliable but likely to be harder.
If you know of a reliable regex that can be used, please feel free to submit a PR to improve the heuristic.
@lildude Oh, I see! I understand that this requires importing React for proper detection. However, I have two questions:
-
In what cases could a file with the
.tsxextension be treated asXML? I wonder if it might be better to remove this possibility, or is it intended for specific cases like React Native? -
Would it be possible to reverse the matching logic? That is, first attempt to match
XML, and if it doesn’t, assume by default that the file isTSX.
On the other hand, I wonder what makes this file have proper syntax highlighting without importing React. However, if I remove most of the HTML or the class names, it turns into a plain text file. The file extension is .tsx.
\n
The same thing happens with this other .tsx extension file, which has React components, but its syntax highlighting seems fine.
1. In what cases could a file with the
.tsxextension be treated asXML? I wonder if it might be better to remove this possibility, or is it intended for specific cases like React Native?
If the file contains XML 😉 From the comments in https://github.com/github-linguist/linguist/issues/2761 and https://github.com/github-linguist/linguist/pull/3464 these appear to be tileset map files.
2. Would it be possible to reverse the matching logic? That is, first attempt to match
XML, and if it doesn’t, assume by default that the file isTSX.
It won't make a difference as both heuristics are quite precise with neither being a fallback "default" option. The reason we're getting the misclassification is the files aren't matching either regex so it's falling through to the classifier.
On the other hand, I wonder what makes this file have proper syntax highlighting without importing React.
I can't determine the source of your files so I can't test it and be precise, but I suspect the content contains tags in common with those in the samples which are used to train the classifier and the classifier has got it right this time.
It won't make a difference as both heuristics are quite precise with neither being a fallback "default" option. The reason we're getting the misclassification is the files aren't matching either regex so it's falling through to the classifier.
Could we make it assume that it is a .tsx file if it doesn't match any heuristics, perhaps using a negative_pattern?
I can't determine the source of your files so I can't test it and be precise, but I suspect the content contains tags in common with those in the samples which are used to train the classifier and the classifier has got it right this time.
If it is of any use, I leave below the complete code that the classifier does classify correctly without the need to use a React import.
Example
import Image from 'next/image'
import styles from './page.module.css'
export default function Home() {
return (
<main className={styles.main}>
<div className={styles.description}>
<p>
Get started by editing
<code className={styles.code}>src/app/page.tsx</code>
</p>
<div>
<a
href="https://vercel.com?utm_source=create-next-app&utm_medium=appdir-template&utm_campaign=create-next-app"
target="_blank"
rel="noopener noreferrer"
>
By{' '}
<Image
src="/vercel.svg"
alt="Vercel Logo"
className={styles.vercelLogo}
width={100}
height={24}
priority
/>
</a>
</div>
</div>
<div className={styles.center}>
<Image
className={styles.logo}
src="/next.svg"
alt="Next.js Logo"
width={180}
height={37}
priority
/>
</div>
<div className={styles.grid}>
<a
href="https://nextjs.org/docs?utm_source=create-next-app&utm_medium=appdir-template&utm_campaign=create-next-app"
className={styles.card}
target="_blank"
rel="noopener noreferrer"
>
<h2>
Docs <span>-></span>
</h2>
<p>Find in-depth information about Next.js features and API.</p>
</a>
<a
href="https://nextjs.org/learn?utm_source=create-next-app&utm_medium=appdir-template&utm_campaign=create-next-app"
className={styles.card}
target="_blank"
rel="noopener noreferrer"
>
<h2>
Learn <span>-></span>
</h2>
<p>Learn about Next.js in an interactive course with quizzes!</p>
</a>
<a
href="https://vercel.com/templates?framework=next.js&utm_source=create-next-app&utm_medium=appdir-template&utm_campaign=create-next-app"
className={styles.card}
target="_blank"
rel="noopener noreferrer"
>
<h2>
Templates <span>-></span>
</h2>
<p>Explore starter templates for Next.js.</p>
</a>
<a
href="https://vercel.com/new?utm_source=create-next-app&utm_medium=appdir-template&utm_campaign=create-next-app"
className={styles.card}
target="_blank"
rel="noopener noreferrer"
>
<h2>
Deploy <span>-></span>
</h2>
<p>
Instantly deploy your Next.js site to a shareable URL with Vercel.
</p>
</a>
</div>
</main>
)
}
For now what I did to avoid the syntax problem was to make an overrides using the .gitattributes file. This apparently only works when the code reaches the corresponding branch, since when opening a pull request it is still seen as plain text. Is there any configuration for this?
# Reclassifying `.tsx` files as `tsx` to fix GitHub syntax highlighting issues.
# Current Linguist heuristics do not correctly recognize `.tsx` files without `import React`.
# https://github.com/github-linguist/linguist/issues/7039
*.tsx linguist-language=tsx
Could we make it assume that it is a .tsx file if it doesn't match any heuristics, perhaps using a negative_pattern?
It already is a .tsx file 😉 If you mean TSX, yes, switching the order and removing the TSX pattern will do the trick… if XML isn't matched, use TSX. This however becomes a problem in future if a new language starts using the same extension but I think that's a hurdle that can be dealt with then. The XML heuristic is specific enough for now. Feel free to submit a PR with this change.
Is there any configuration for this?
No.
Here's another example - https://github.com/remix-run/remix/blob/main/templates/express/app/root.tsx