inparse
inparse copied to clipboard
Open Collaborative AI Driven Parser builder for Web Scraping, Data Extraction and Crawling,Knowledge Graph
In.parse
Open Collaborative AI Driven Parser builder for Web Scraping, Data Extraction and Crawling,Knowledge Graph
1.Build Parser with:
http://inparse.com
2.Call Parser in your favaritor way :
There are three ways to execute parser in detail page . etc: http://inparse.com/parser/628128d0
a).Call Parser Directly:
http://inparse.com/api/parser/?parser_no=628128d0&access_token=037ce4079d21b47ac8bbc730a6da0ba8&url=https%3A%2F%2Fmedium.com%2Faccel-india-insights%2Fmaintaining-productivity-as-engineering-teams-scale-1a821f5add28
b).Get Parser rule:
http://inparse.com/api/parser/?parser_no=628128d0&access_token=037ce4079d21b47ac8bbc730a6da0ba8
c). Call Parser locally with SDK:
pip install -U inparse
from inparse import Inparse
p=Inparse('b45beddc', #parser no is generator by inparse.com parser builder.
'd50cb533f69b6a78892afbd093f95fc1') #access token can be found in your user page .
d=p.parse_url('https://qz.com/india/1413291/trulymadly-ceo-on-how-dating-apps-like-bumble-india-must-localise/')
Inparse.pretty_print(d)
Or parse in raw html
from inparse import Inparse
import requests
p=Inparse('b45beddc', #parser no is generator by inparse.com parser builder.
'd50cb533f69b6a78892afbd093f95fc1') #access token can be found in your user page .
html=requests.get('https://qz.com/india/1413291/trulymadly-ceo-on-how-dating-apps-like-bumble-india-must-localise/').text
d=p.parse(html)
Inparse.pretty_print(d)
Below is output of Article data extraction
{ 'article_body': '<div><p>Last week, American dating app <a '
'href="https://qz.com/india/1413051/priyanka-chopra-invests-in-dating-app-bumble-to-rival-tinder/">Bumble '
'It’s from smaller cities. And varied people are coming '
'from different backgrounds. So that’s really '
'encouraging.</p></div>',
'author': 'Kuwar Singh',
'publish_date': None,
'title': 'Young Indians are using dating apps for so much more than just '
'dating',
'top_image': [ 'https://cms.qz.com/wp-content/uploads/2018/10/AP_900509923043-e1538971405267.jpg?quality=75&strip=all&w=410&h=231']
}