rightmove_webscraper.py
rightmove_webscraper.py copied to clipboard
Additional columns for data. I can see only 8
Hi, first of all, many thanks for this, loving the tool. I would find very useful to get access to the columns sold price and year sold for properties for sale.
Is there any way I can do this? Also, The address doesn't return the house number or the full post code, is there a way around this?
Many thanks
@alandinbedia @toby-p
I believe that is easily achievable by changing some regex and data params in the init file https://github.com/toby-p/rightmove_webscraper.py/blob/master/rightmove_webscraper/init.py
Regex
244 # Extract postcodes to a separate column:
. . . pat = r"\b([A-Za-z][A-Za-z]?[0-9][0-9]?[A-Za-z]?)\b"
. . . results["postcode"] = results["address"].astype(str).str.extract(pat, expand=True)
Additional data from xpath
175 # Create data lists from xpaths:
. . . price_pcm = tree.xpath(xp_prices)
. . . titles = tree.xpath(xp_titles)
. . . addresses = tree.xpath(xp_addresses)
Great thanks I will give it a try
I originally set the regex to only get the first part of the postcode since this was all I could find available in the listings - if the full postcode is available on some listings then it would definitely be better to collect this as well if possible.
@alandinbedia @toby-p update for full postcode PR