scrapemark
scrapemark copied to clipboard
Super-convenient web scraping in Python
Some fixes and stuff.
Reported by [email protected], Jan 30, 2010 What steps will reproduce the problem? Scrape the `` of http://www.sony.jp/ ``` res = scrapemark.scrape("{{title}}", url="http://www.sony.jp/") ``` Print the result ``` print res['title'] ```...
is it possible to deal with unicode strings or to convert the regular ascii string returned by scrapemark to unicode?
This could be a user error but have tried every permutation I can think of without success. I'm using the versin of scrapemark.py updated on Aug 11, 2011. Here is...
First, thanks for this wonderful tool! I have the following problem: when trying the following snippet: ``` python import scrapemark html = """ Page 1 Page 2 Page 3 >...
scrapemark is trying to convince me "Ø" is unicode... Attempt to print raises: UnicodeEncodeError: 'ascii' codec can't encode character u'\xd8' in position 6: ordinal not in range(128)
Reported by [email protected], Oct 29, 2010 What steps will reproduce the problem? 1. when `m.group(0) == '#x201C'` in `_substitute_entity()`. 2. `unichr(int(ent)) (where ent=='x201C')` throws ValueError. What is the expected output?...
Here is a simple use case: scrape('', '') -> None scrape('', '') -> None scrape('', ''.lower()) -> www.google.com (Thanks for the great library by the way)
Reported by toshiba13, Jun 23, 2009 New filter encoding? might be useful? international languages. ``` if f == 'utf-8': if issubclass(type(s), basestring): s = s.encode('utf8') ``` One could use a...
Reported by project member adamrshaw, Oct 22, 2009 maybe something like this... ``` {{ var+ }} {{ var+ }} ``` ... would result in 'var' having the inner text of...