scrapemark
scrapemark copied to clipboard
Exception while parsing things like '<a href="">Some text</a>'
Reported by [email protected], Apr 27, 2010
** What steps will reproduce the problem?
At the Python console, type import scrapemark scrapemark.scrape( '{* {{ [links].title }} *}', html = 'Some text' )
** What is the expected output? What do you see instead?
Expected:
{'links': [{'title': u'Some text', 'url': u''}]}
Actual:
Traceback (most recent call last):
File "
** What version of the product are you using? On what operating system?
scrapemark-0.9, from the source distribution Mac OS X Version 10.6.3 Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
** Please provide any additional information below.
Below is a workaround:
diff -ub scrapemark.py.orig scrapemark.py
--- scrapemark.py.orig 2010-04-28 01:00:58.000000000 -0400
+++ scrapemark.py 2010-04-28 00:59:03.000000000 -0400
@@ -541,7 +541,10 @@
def _parse_attrs(s):
attrs = {}
for m in _attr_re.finditer(s):
- attrs[m.group(1)] = m.group(3) or m.group(4)
+ value = m.group(3)
+ if value is None:
+ value = m.group(4)
+ attrs[m.group(1)] = value
return attrs
def _next_tag(s, i, tag_open_re, tag_close_re, depth=1): # returns (tag body, substringindex after tag)
This seems to be happen with any empty attribute.