playwright-web-scraping Skip content that doens't exist

Feb 25 '23 18:02 jb-cloud

In your section about scraping with Python, it it possible to continue on in the loop if something doesn't exist? For example if one of the books didn't list a price (price_el) how could you handle that?

from playwright.async_api import async_playwright
import asyncio
async def main():
    async with async_playwright() as pw:
        browser = await pw.chromium.launch()
        page = await browser.new_page()
        await page.goto('https://books.toscrape.com')
        all_items = await page.query_selector_all('.product_pod')
        books = []
        for item in all_items:
            book = {}
            name_el = await item.query_selector('h3')
            book['name'] = await name_el.inner_text()
            price_el = await item.query_selector('.price_color')
            book['price'] = await price_el.inner_text()
            stock_el = await item.query_selector('.availability')
            book['stock'] = await stock_el.inner_text()
            books.append(book)
        print(books)
        await browser.close()
if __name__ == '__main__':
    asyncio.run(main())

Feb 25 '23 18:02 jb-cloud

Hi! You could check if price_el evaluates to True and if it does - access the inner_text() function. This way it will not add price key if there is no price element found.

            if price_el:
                book['price'] = await price_el.inner_text()

Oct 06 '23 12:10 oxyagne