Strip anchor tags from provenance URLs / canonicalize URLs
We store provenance information when users collect prompts. We use this for analytics and will eventually use this to group "same-page" prompts in the interface. If the author provides a canonical URL, we'll use that. But if they don't, we just look at the current page URL. Ideally we want to make this as canonical as possible. If someone visits http://foo.com/essay#some-header, we should store http://foo.com/essay as the URL, stripping the anchor.
Are there general libraries / algorithms / heuristics for fuzzily canonicalizing URLs?
Also this kind of nonsense is showing up:
https://andymatuschak.org/prompts/?ck_subscriber_id=1121236996&utm_source=convertkit&utm_medium=email&utm_campaign=Creating+Habits+%F0%9F%A7%A4%20-%205117179
Not quite sure how to deal with that. In some cases, the query nonsense is meaningful. Bluh.