scrapemark
scrapemark copied to clipboard
ValueError in _substitute_entity() substituting '#x201C' like strings
Reported by [email protected], Oct 29, 2010
What steps will reproduce the problem?
- when
m.group(0) == '#x201C'
in_substitute_entity()
. -
unichr(int(ent)) (where ent=='x201C')
throws ValueError.
What is the expected output? What do you see instead? unichr() wants integer 0x201C.
What version of the product are you using? On what operating system? scrapemark-0.9-py2.5.egg Python 2.6.4 Ubuntu 9.10 x64
Please provide any additional information below.
adding this function:
def my_int(s):
try: return int(s)
except: pass
try: return int(s, 16)
except: pass
if len(s)>0 and s[0].lower() == 'x':
try: return int('0'+s, 16)
except: pass
return 0
and substitute:
unichr(int(ent)) with unichr(my_int(ent))
seems to fix the problem.
Probably fixed in #9.