translators JSTOR: More reliable way of extract page canonical link; type detection fixes

JSTOR: More reliable way of extract page canonical link; type detection fixes

Open zoe-translates opened this issue 1 year ago • 3 comments

The canonical link (permalink) of the current page being scraped is directly extracted from the element in the HTML head.

~~Note to @dstillman - this is a hotfix and it doesn't address other failing test cases caused by incorrect type detection. I'll push another fix for those, but I need a bit more time.~~

Edit: I'll add a few more commits that address other failing tests in this branch.

Fixes #3104, from which a test case is added.

Aug 16 '23 15:08 zoe-translates

@AbeJellinek, thanks. This PR is now mostly ready.

Aug 18 '23 02:08 zoe-translates

Great - anything else to do or can we merge?

Aug 22 '23 13:08 AbeJellinek

I'm fine with merging :)

Aug 22 '23 13:08 zoe-translates

translators translators copied to clipboard

JSTOR: More reliable way of extract page canonical link; type detection fixes

translators
translators copied to clipboard