core
core copied to clipboard
getUrlString should check for valid URL
-
remark-embedder-core
version: 3.0.3 -
node
version: v20.10.0 -
npm
version: 10.2.3
Relevant code or config
const getUrlString = (url: string): string | null => {
const urlString = url.startsWith('http') ? url : `https://${url}`
try {
return new URL(urlString).toString()
} catch (error: unknown) {
return null
}
}
const urlString = getUrlString(value)
What you did: I run a simple markdown like below:
#### Output
- Fruit
- Apple
- Orange
- Banana
- Dairy
- Milk
- Cheese
I created my own version of oembed transformer that included a fallback, that if the extracted link is not an oembed link, then I use my own custom bookmark around it.
What happened:
Unfortunately @remark-emedder/core returns even simple strings as URLs:
🚀 ~ remarkEmbedder, node: {
type: 'text',
value: 'Banana',
position: {
start: { line: 171, column: 9, offset: 3707 },
end: { line: 171, column: 15, offset: 3713 }
}
}
🚀 ~ remarkEmbedder, isValidLink: false
🚀 ~ remarkEmbedder, value: Banana
🚀 ~ remarkEmbedder, urlString: Banana
🚀 ~ shouldTransform: ~ url: https://banana/
Reproduction repository:
Problem description:
@remark-emedder/core getUrlString
returns every single line string as url (with https://
appended to it. It seems to later rely on shouldTransform
function to filter all such links out, but in some cases this is too late. I can't check in shouldTransform
function if the link is a valid URL, because it always is, coming out of getUrlString
function.
Suggested solution:
Enhance the getUrlString
function to check if the given text is actually a link using some robust regex, and only return a true, viable URL
Something like this works:
const getUrlString = url => {
const urlRegex = new RegExp(
/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=+$,\w]+@)?[A-Za-z0-9.-]+(:[0-9]+)?|(?:www.|[-;:&=+$,\w]+@)[A-Za-z0-9.-]+)((?:\/[+~%/.\w-_]*)?\??(?:[-+=&;%@.\w_]*)#?(?:[\w]*))?)/
);
if (!urlRegex.test(url)) {
console.log('🚩 Not a valid URL!:', url);
return null;
}
const urlString = url.startsWith('http') ? url : `https://${url}`;
try {
return new URL(urlString).toString();
} catch (error) {
return null;
}
};