excalibur icon indicating copy to clipboard operation
excalibur copied to clipboard

Extract tables from webpage

Open vinayak-mehta opened this issue 6 years ago • 1 comments

Tables can be extracted from a webpage using pandas.read_html. We can create an interface 1) simple: where user can submit the link of the webpage and download extracted tables or 2) fancy: where user can submit the link of the webpage, see detected tables (on an image of the webpage?), un-select the tables they don't want and then download extracted tables.

vinayak-mehta avatar Nov 07 '18 13:11 vinayak-mehta

Just an idea, one can do this with google spreadsheet IMPORTHTML function to extract tables from a webpage. https://support.google.com/docs/answer/3093339?hl=en

Sample Usage IMPORTHTML("http://en.wikipedia.org/wiki/Demographics_of_India","table",4)

IMPORTHTML(A2,B2,C2)

majestique avatar Jan 12 '19 03:01 majestique