scrapemark issues

Pull Request

3

Some fixes and stuff.

Multibyte non-utf-8 encoded pages are decoded incorrectly

1

Reported by [email protected], Jan 30, 2010 What steps will reproduce the problem? Scrape the `` of http://www.sony.jp/ ``` res = scrapemark.scrape("{{title}}", url="http://www.sony.jp/") ``` Print the result ``` print res['title'] ```...

arshaw

unicode support?

is it possible to deal with unicode strings or to convert the regular ascii string returned by scrapemark to unicode?

bookie988

Problem with nested loop

This could be a user error but have tried every permutation I can think of without success. I'm using the versin of scrapemark.py updated on Aug 11, 2011. Here is...

phoebebright

Attribute value ignored when capturing another attribute value in the same tag

1

First, thanks for this wonderful tool! I have the following problem: when trying the following snippet: ``` python import scrapemark html = """ Page 1 Page 2 Page 3 >...

ackalker

Incorrect reading for "Ø"

scrapemark is trying to convince me "&Oslash;" is unicode... Attempt to print raises: UnicodeEncodeError: 'ascii' codec can't encode character u'\xd8' in position 6: ordinal not in range(128)

zalun

ValueError in _substitute_entity() substituting '#x201C' like strings

1

Reported by [email protected], Oct 29, 2010 What steps will reproduce the problem? 1. when `m.group(0) == '#x201C'` in `_substitute_entity()`. 2. `unichr(int(ent)) (where ent=='x201C')` throws ValueError. What is the expected output?...

arshaw

Problem with <a HREF>

2

Here is a simple use case: scrape('', '') -> None scrape('', '') -> None scrape('', ''.lower()) -> www.google.com (Thanks for the great library by the way)

phzbox

custom filters

2

Reported by toshiba13, Jun 23, 2009 New filter encoding? might be useful? international languages. ``` if f == 'utf-8': if issubclass(type(s), basestring): s = s.encode('utf8') ``` One could use a...

arshaw

syntax for concatenating captures

1

Reported by project member adamrshaw, Oct 22, 2009 maybe something like this... ``` {{ var+ }} {{ var+ }} ``` ... would result in 'var' having the inner text of...

arshaw

scrapemark
scrapemark copied to clipboard

Metadata

Pull Request

Multibyte non-utf-8 encoded pages are decoded incorrectly

unicode support?

Problem with nested loop

Attribute value ignored when capturing another attribute value in the same tag

Incorrect reading for "Ø"

ValueError in _substitute_entity() substituting '#x201C' like strings

Problem with <a HREF>

custom filters

syntax for concatenating captures

← Metadata

Owner

Metadata

scrapemark scrapemark copied to clipboard

Metadata

← Metadata

Owner

Metadata

scrapemark
scrapemark copied to clipboard