obsidian-stack-overflow Refactor extractAnswerId to make use of the URL parser?

trafficstars

Just had a brief skim through the code and noticed that you're treating the url as a string in extractAnswerId, then doing string manipulation to parse out the various bits.

Not sure if you're aware of the URL API, but it might be helpful to simply/make your extraction more robust:

https://developer.mozilla.org/en-US/docs/Web/API/URL

We could also make the checks a bit more robust using string.match() or similar:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match

While I haven't directly tested this, the code would likely end up looking something like:

+  const shortUrlRegex = /\/a\/\d+\/(\d+)/;

	extractAnswerId(url: string) {
		try {
			if (url.length === 0) {
				new Notice("[Stack Overflow Answers] - url is empty");
				return "";
			}
+            const parsedUrl = new URL(url);
+
-			if (url.includes("#"))
			 // in form of https://stackoverflow.com/questions/14122919/how-to-hide-show-objects-using-three-js-release-54/14123978#14123978
-				return url.split("#").pop();
+            if (parsedUrl.hash) return parsedUrl.hash.slice(1);
+         
-			const urlPopped = url.split("/"); // in form of https://stackoverflow.com/a/32232028/3952024
-			urlPopped.pop();
-			return urlPopped.pop(); // second to last
+          // in form of https://stackoverflow.com/a/32232028/3952024
+          const shortUrlMatch = parsedUrl.pathname.match(shortUrlRegex);
+          if (shortUrlMatch) return shortUrlMatch[1];
+
+          // None of our URL extractions matched
+	       new Notice("[Stack Overflow Answers] - Could not extract answer id from url");
+	       return "";
		} catch (err) {
			new Notice("[Stack Overflow Answers] - Could not extract answer id from url");
			return "";
		}
	}

I'm not 100% sure, but this may also end up being a fix for https://github.com/bramses/obsidian-stack-overflow/issues/3, as I believe the current code would return pine-script-optimize-with-lower-timeframe-backtest from the URL due to assuming that if there is no hash, that it will be in the /a/123/456 format.

Jun 23 '22 03:06 0xdevalias

Great ideas! I really like the idea of string.match() for more URL robustness. What types of urls would be captured that aren't currently?

The plugin currently captures answers in long form (https://stackoverflow.com/questions/61907922/pine-script-optimize-with-lower-timeframe-backtest/72722933#72722933) (the answer hash is at the end) and short form (https://stackoverflow.com/a/72722933/3952024)

I think entire questions may be out of scope, but open to suggestions!

Jun 23 '22 17:06 bramses

I think entire questions is the main other URL i've run into myself so far. I definitely understand if processing/extracting the questions is out of scope, but at the very least detecting those URLs and giving a user friendlier 'not going to work' sort of error would improve the UX I feel.

Jun 24 '22 00:06 0xdevalias

obsidian-stack-overflow obsidian-stack-overflow copied to clipboard

Refactor extractAnswerId to make use of the URL parser?

obsidian-stack-overflow
obsidian-stack-overflow copied to clipboard