ScrapeGen icon indicating copy to clipboard operation
ScrapeGen copied to clipboard

Use jinja2

Open hughdbrown opened this issue 6 years ago • 3 comments

The code presented is a mixture of argument parsing, code generation, writing to file. Most of the logic here could be replaced by use of the python template library, jinja2.

hughdbrown avatar Dec 08 '19 22:12 hughdbrown

Isn't Jinja2 for Web? Not sure why would you recommend it.

kadnan avatar Dec 09 '19 05:12 kadnan

Jinja2 is for any text generation, including code generation.

Your README.md is more or less this:

from jinja2 import Template
from collections import namedtuple

template = Template('''
from bs4 import BeautifulSoup
import requests

{% for fn in fns %}
    def get_{{ fn.name }}(soup_obj):
        {{ fn.name }}_selection = soup.obj(fn.css_sel)
        _{{ fn.name }} = next({{ fn }}_selection, None)
        return _{{ fn.name }} and _{{ fn.name }}.text.strip()
{% endfor %}

def parse(url):
    r = requests.get(url)
    if r.status_code == 200:
        html = r.text.strip()
        soup = BeautifulSoup(html)
{% for fn in fns %}
        {{ fn.name }} = get_{{ fn.name}}(soup)
{% endfor %}
        
if __name__ == '__main__':
    parse({{ url }})
''')

Func = namedtuple("Func", ["name", "css_sel"])
result = template.render({
     "url": "https://www.olx.com.pk/item/1-kanal-brand-bew-banglow-available-for-sale-in-wapda-town-iid-1009971253",
     "fns": [
         Func("price", "#container > main > div > div > div.rui-2SwH7.rui-m4D6f.rui-1nZcN.rui-3CPXI.rui-3E1c2.rui-1JF_2 > div.rui-2ns2W._2r-Wm > div > section > span._2xKfz"),
         Func("seller", "#container > main > div > div > div.rui-2SwH7.rui-m4D6f.rui-1nZcN.rui-3CPXI.rui-3E1c2.rui-1JF_2 > div.rui-2ns2W.YpyR- > div > div > div._1oSdP > div > a > div"),
     ]
})
print(result)

If you changed the template text from being inline to being in an external file, the entire program would be:

  • read the YAML file
  • read the template file
  • create a jinja2 Template object from the text in the template file
  • render the Template object using the arguments in the YAML file

hughdbrown avatar Dec 09 '19 06:12 hughdbrown

Interesting. I have no plan to change this atm. Maybe I use your approach in future work. You are also welcome to fork it.

kadnan avatar Dec 09 '19 14:12 kadnan