parsel
parsel copied to clipboard
Create root node memory 210
Aimed to fix #210.
features of https://github.com/scrapy/parsel/pull/213 implemented as extension to default parsel.Selector
class according to @Gallaecio suggestion.
in case if input parameter text
for Selector - string
:
like sel= Selector(text='<ul><li id="1">1</li><li id="2">2</li></ul>')
-> expected.. working without changes as it works now.
s1 = Selector(text='<ul><li id="1">1</li><li id="2">2</li></ul>')
print(s1.css('li::text').getall())
# output -> ['1', '2']
if text
is bytes
(current vetsion raises TypeError
):
it is expected that parser will interpret bytes
input according to encoding
parameter added in this PR:
s2 = Selector(text=b'<ul><li id="1">1</li><li id="2">2</li></ul>', encoding='ascii')
print(s2.css('li::text').getall())
# output -> ['1', '2']
s3 = Selector(text=b'<ul><li id="1">1\xD0\xA4</li><li id="2">2</li></ul>', encoding='utf8') #cyryllic Ф symbol added
print(s3.css('li::text').getall())
# output -> ['1Ф', '2']
In case if text
-bytes
and encoding
is not specified -> it will interpret input as utf8
s4 = Selector(text=b'<ul><li id="1">1\xD0\xA4</li><li id="2">2</li></ul>')
print(s4.css('li::text').getall())
# output -> ['1Ф', '2']
code sample (scrapy) with usage of updated `Selector` class
import scrapy
from scrapy.crawler import CrawlerProcess
from parsel.selector import Selector
class QuotesToScrapeSpider(scrapy.Spider):
name = "quotes"
custom_settings = {
"DOWNLOAD_DELAY":1,
"DOWNLOADER_MIDDLEWARES":
{
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware': None,
}
}
def start_requests(self):
yield scrapy.Request(url='https://quotes.toscrape.com', callback=self.parse)
def parse(self, response):
print(f"memory allocation (body) as str made: {str(bool(response._cached_ubody))}") # < expected False
sel = Selector(response.body, encoding=response.encoding) # expected encoding Utf8
links = sel.css("a::attr(href)").getall()
print(links)
print(f"memory allocation (body) as str made: {str(bool(response._cached_ubody))}")
process = CrawlerProcess()
process.crawl(QuotesToScrapeSpider)
process.start()
Trying to trigger tests…
@Gallaecio
Created new testcases for checking selectors with bytes
input.
Codecov Report
Merging #217 (c5597a7) into master (f5f73d3) will not change coverage. The diff coverage is
100.00%
.
@@ Coverage Diff @@
## master #217 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 5 5
Lines 290 293 +3
Branches 59 60 +1
=========================================
+ Hits 290 293 +3
Impacted Files | Coverage Δ | |
---|---|---|
parsel/selector.py | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update f5f73d3...c5597a7. Read the comment docs.
Maybe this needs conflict resolution before the tests can restart?