translators
translators copied to clipboard
JSTOR: More reliable way of extract page canonical link; type detection fixes
The canonical link (permalink) of the current page being scraped is directly extracted from the element in the HTML head.
Note to @dstillman - this is a hotfix and it doesn't address other failing test cases caused by incorrect type detection. I'll push another fix for those, but I need a bit more time.
Edit: I'll add a few more commits that address other failing tests in this branch.
Fixes #3104, from which a test case is added.
@AbeJellinek, thanks. This PR is now mostly ready.
Great - anything else to do or can we merge?
I'm fine with merging :)