excalibur
excalibur copied to clipboard
Extract tables from webpage
Tables can be extracted from a webpage using pandas.read_html
. We can create an interface 1) simple: where user can submit the link of the webpage and download extracted tables or 2) fancy: where user can submit the link of the webpage, see detected tables (on an image of the webpage?), un-select the tables they don't want and then download extracted tables.
Just an idea, one can do this with google spreadsheet IMPORTHTML function to extract tables from a webpage. https://support.google.com/docs/answer/3093339?hl=en
Sample Usage
IMPORTHTML("http://en.wikipedia.org/wiki/Demographics_of_India","table",4)
IMPORTHTML(A2,B2,C2)