ScrapeGen
ScrapeGen copied to clipboard
Use jinja2
The code presented is a mixture of argument parsing, code generation, writing to file. Most of the logic here could be replaced by use of the python template library, jinja2.
Isn't Jinja2 for Web? Not sure why would you recommend it.
Jinja2 is for any text generation, including code generation.
Your README.md is more or less this:
from jinja2 import Template
from collections import namedtuple
template = Template('''
from bs4 import BeautifulSoup
import requests
{% for fn in fns %}
def get_{{ fn.name }}(soup_obj):
{{ fn.name }}_selection = soup.obj(fn.css_sel)
_{{ fn.name }} = next({{ fn }}_selection, None)
return _{{ fn.name }} and _{{ fn.name }}.text.strip()
{% endfor %}
def parse(url):
r = requests.get(url)
if r.status_code == 200:
html = r.text.strip()
soup = BeautifulSoup(html)
{% for fn in fns %}
{{ fn.name }} = get_{{ fn.name}}(soup)
{% endfor %}
if __name__ == '__main__':
parse({{ url }})
''')
Func = namedtuple("Func", ["name", "css_sel"])
result = template.render({
"url": "https://www.olx.com.pk/item/1-kanal-brand-bew-banglow-available-for-sale-in-wapda-town-iid-1009971253",
"fns": [
Func("price", "#container > main > div > div > div.rui-2SwH7.rui-m4D6f.rui-1nZcN.rui-3CPXI.rui-3E1c2.rui-1JF_2 > div.rui-2ns2W._2r-Wm > div > section > span._2xKfz"),
Func("seller", "#container > main > div > div > div.rui-2SwH7.rui-m4D6f.rui-1nZcN.rui-3CPXI.rui-3E1c2.rui-1JF_2 > div.rui-2ns2W.YpyR- > div > div > div._1oSdP > div > a > div"),
]
})
print(result)
If you changed the template text from being inline to being in an external file, the entire program would be:
- read the YAML file
- read the template file
- create a jinja2 Template object from the text in the template file
- render the Template object using the arguments in the YAML file
Interesting. I have no plan to change this atm. Maybe I use your approach in future work. You are also welcome to fork it.