python-for-data-and-media-communication-gitbook icon indicating copy to clipboard operation
python-for-data-and-media-communication-gitbook copied to clipboard

Problem of scraping number data from Dianping.com

Open zacharyzeng opened this issue 7 years ago • 2 comments

Troubleshooting

Describe your environment

  • Operating system: MacOS 10.13.6
  • Python version: 3.6
  • Hardware: MacBook Pro 13-inch, 2017,
  • Jupyter notebook or not? [Y/N]: Y

Describe your question

I cannot get the number of the tag attributes from Dianping website

The minimum code (snippet) to reproduce the issue

from bs4 import BeautifulSoup
url= 'http://www.dianping.com/chengdu/ch10/g34060r1577'
browser = webdriver.Chrome()
browser.get(url)
h = browser.find_element_by_css_selector('html')
t = h.get_attribute('innerHTML')
mypage = BeautifulSoup(t)
dianping_list = []
h = mypage.find('div', attrs={'class': 'content'})
i = h.find_all('div',attrs={'class':'txt'})
remark = i[0].find('div',attrs={'class':'comment'}).find('a',attrs={'class':'review-num'})

remark.b```



link: https://github.com/zacharyzeng/Bug_Centre/blob/master/dianping.ipynb

zacharyzeng avatar Nov 20 '18 13:11 zacharyzeng

The solution is here:

https://github.com/hupili/python-for-data-and-media-communication/blob/master/scraper-selenium/dianping%20comment%20number.ipynb

The sample data is here:

https://github.com/hupili/python-for-data-and-media-communication/blob/master/scraper-selenium/dianping.csv

This case is too much beyond our curriculum. However, it is also good that you bring it up. The demo code may not work on your side directly. You need to study my logics and revise the decoder table and decode function accordingly. The way of analysis is more important than the result.


p.s. This issue is an excellent demo of efficiently asking questions.

hupili avatar Nov 20 '18 15:11 hupili

The solution is here:

https://github.com/hupili/python-for-data-and-media-communication/blob/master/scraper-selenium/dianping%20comment%20number.ipynb

The sample data is here:

https://github.com/hupili/python-for-data-and-media-communication/blob/master/scraper-selenium/dianping.csv

This case is too much beyond our curriculum. However, it is also good that you bring it up. The demo code may not work on your side directly. You need to study my logics and revise the decoder table and decode function accordingly. The way of analysis is more important than the result.

p.s. This issue is an excellent demo of efficiently asking questions.

Thanks Pili, I will try to learn and decode it.

zacharyzeng avatar Nov 20 '18 15:11 zacharyzeng