w3lib icon indicating copy to clipboard operation
w3lib copied to clipboard

[MRG+1] Added: Removing comments before extracting base URLs. Not a solution to #70, but does help in some cases.

Open starrify opened this issue 8 years ago • 6 comments

Helps resolving the issue in such cases, which does happen in several websites:

>>> from w3lib import html
>>> html.get_base_url("""<!-- <base href="http://example.com/" /> -->""")
'http://example.com/'

Fixes #70 (since the original #70 report is about this scenario; for other scenarios, we should have separate issues)

starrify avatar Oct 25 '16 06:10 starrify

Current coverage is 94.10% (diff: 100%)

Merging #77 into master will increase coverage by 0.01%

@@             master        #77   diff @@
==========================================
  Files             7          7          
  Lines           406        407     +1   
  Methods           0          0          
  Messages          0          0          
  Branches         84         84          
==========================================
+ Hits            382        383     +1   
  Misses           16         16          
  Partials          8          8          

Powered by Codecov. Last update 03c28d2...11b5d26

codecov-io avatar Oct 25 '16 06:10 codecov-io

Can you add tests for this? Can you provide example websites showing this issue?

redapple avatar Oct 25 '16 09:10 redapple

Thanks for the notice @redapple . A test has been added.

Here's a sample site which triggers this issue: http://planweb01.rother.gov.uk/OcellaWeb/planningSearch

starrify avatar Oct 26 '16 07:10 starrify

@kmike Could you have a look?

Gallaecio avatar Sep 17 '19 06:09 Gallaecio

This is related to #70

Gallaecio avatar Sep 17 '19 12:09 Gallaecio

Bumping to close outdated PR.

yozachar avatar Jul 20 '22 06:07 yozachar