python-for-data-and-media-communication-gitbook icon indicating copy to clipboard operation
python-for-data-and-media-communication-gitbook copied to clipboard

Problem of data visualization (Dianping.com)

Open CelineTsang opened this issue 7 years ago • 2 comments

Problem of data visualization (Dianping.com)

Describe your environment

  • Operating system:
  • Python version:
  • Hardware:
  • Internet access:
  • Jupyter notebook or not? [Y/N]: y
  • Which chapter of book?:

I grabbed the restaurant information of dianping.com/Hong Kong (including name, address phone number, tags, feature, average price), but since each data is unique, such as the price does not exist range, are specific Numbers. tags: e.g.标签:粤菜 中餐 茶餐厅 米其林三星 This makes it difficult to visualize because there is too much text to display complete information and so on. I wonder if you have any ideas and solutions to offer? 2018-11-19 4 48 08 2018-11-19 4 48 33 2018-11-19 4 48 42 The original csv is like this: 2018-11-21 6 54 26

CelineTsang avatar Nov 21 '18 10:11 CelineTsang

Please leave your notebook link and better the NBViewer link since there are dynamic charts.

hupili avatar Nov 21 '18 13:11 hupili

For your dataset, followings are suggested:

  • transform the price column from str to int. Once done, 1D analysis on this column can already yield to multiple story points, e.g. distribution/ min/ max/ ...
  • For the tags column, you may want to convert it to numeric encoding. One way is to test keyword for each cell and put 1 if the keyword exists, or 0 otherwise. The process is very similar to the first image in your question body. Another reference is this top name in tweets

hupili avatar Nov 21 '18 13:11 hupili