rightmove_webscraper.py icon indicating copy to clipboard operation
rightmove_webscraper.py copied to clipboard

Additional columns for data. I can see only 8

Open alandinbedia opened this issue 4 years ago • 4 comments

Hi, first of all, many thanks for this, loving the tool. I would find very useful to get access to the columns sold price and year sold for properties for sale.

Is there any way I can do this? Also, The address doesn't return the house number or the full post code, is there a way around this?

Many thanks

alandinbedia avatar Apr 23 '20 11:04 alandinbedia

@alandinbedia @toby-p

I believe that is easily achievable by changing some regex and data params in the init file https://github.com/toby-p/rightmove_webscraper.py/blob/master/rightmove_webscraper/init.py

Regex

244       # Extract postcodes to a separate column:
. . .        pat = r"\b([A-Za-z][A-Za-z]?[0-9][0-9]?[A-Za-z]?)\b"
. . .        results["postcode"] = results["address"].astype(str).str.extract(pat, expand=True)

Additional data from xpath

175      # Create data lists from xpaths:
. . .        price_pcm = tree.xpath(xp_prices)
. . .        titles = tree.xpath(xp_titles)
. . .        addresses = tree.xpath(xp_addresses)

osmya avatar Apr 24 '20 08:04 osmya

Great thanks I will give it a try

alandinbedia avatar Apr 24 '20 13:04 alandinbedia

I originally set the regex to only get the first part of the postcode since this was all I could find available in the listings - if the full postcode is available on some listings then it would definitely be better to collect this as well if possible.

toby-p avatar Apr 30 '20 15:04 toby-p

@alandinbedia @toby-p update for full postcode PR

p2327 avatar May 14 '20 22:05 p2327