playwright-web-scraping
playwright-web-scraping copied to clipboard
Skip content that doens't exist
In your section about scraping with Python, it it possible to continue on in the loop if something doesn't exist? For example if one of the books didn't list a price (price_el) how could you handle that?
from playwright.async_api import async_playwright
import asyncio
async def main():
async with async_playwright() as pw:
browser = await pw.chromium.launch()
page = await browser.new_page()
await page.goto('https://books.toscrape.com')
all_items = await page.query_selector_all('.product_pod')
books = []
for item in all_items:
book = {}
name_el = await item.query_selector('h3')
book['name'] = await name_el.inner_text()
price_el = await item.query_selector('.price_color')
book['price'] = await price_el.inner_text()
stock_el = await item.query_selector('.availability')
book['stock'] = await stock_el.inner_text()
books.append(book)
print(books)
await browser.close()
if __name__ == '__main__':
asyncio.run(main())
Hi! You could check if price_el evaluates to True and if it does - access the inner_text() function. This way it will not add price key if there is no price element found.
if price_el:
book['price'] = await price_el.inner_text()