Html2Text how to add a new line after each element
Hi,
Hope you can help me with this. Please forgive me, I am not an expert on html2text. I think this would be easy to do somehow but I cannot find how
I have this url: http://adriansantos.me/test.html and this job:
name: "AS"
url: "http://adriansantos.me/test.html"
max_tries: 2
ssl_no_verify: true
filter:
- xpath: //*[@class= 'single-opportunity' and span[contains(text(), 'United Kingdom') or contains(text(), 'UNITED KINGDOM')]]
- html2text:
method: pyhtml2text
unicode_snob: true
body_width: 0
inline_links: true
ignore_links: false
ignore_images: true
single_line_break: true
- sort:
---
Which produces the following output:
[ IT Technical Product Owner United Kingdom ](https://career.camlingroup.com/careers/opportunities/tpo-111901-it-technical-product-owner) [ IT Technical 2 United Kingdom ](https://career.camlingroup.com/careers/opportunities/tpo-111901-it-technical-product-owner) [ IT Technical 3 UNITED KINGDOM ](https://career.camlingroup.com/careers/opportunities/tpo-111901-it-technical-product-owner)
How can I make html2text add a new line after each "element"? I mean, how can I achieve this?:
[ IT Technical Product Owner United Kingdom ](https://career.camlingroup.com/careers/opportunities/tpo-111901-it-technical-product-owner)
[ IT Technical 2 United Kingdom ](https://career.camlingroup.com/careers/opportunities/tpo-111901-it-technical-product-owner)
[ IT Technical 3 UNITED KINGDOM ](https://career.camlingroup.com/careers/opportunities/tpo-111901-it-technical-product-owner)
Thanks for your help
Did you try removing single_line_break?
Yes, that doesn't work either
Nothing wrong with html2text: your XPath is passing a series of <a> elements that don't have any separation between them:
<a class="single-opportunity" href="https://career.camlingroup.com/careers/opportunities/tpo-111901-it-technical-product-owner">
IT Technical Product Owner <span class="">United Kingdom</span>
</a>
<a class="single-opportunity" href="https://career.camlingroup.com/careers/opportunities/tpo-111901-it-technical-product-owner">
IT Technical 2 <span class="">United Kingdom</span>
</a>
<a class="single-opportunity" href="https://career.camlingroup.com/careers/opportunities/tpo-111901-it-technical-product-owner">
IT Technical 3 <span class="">UNITED KINGDOM</span>
</a>
If you want line breaks for this specific HTML your XPath needs to capture the outer container as well, in this case a <li>:
filter:
- xpath: //*[*[@class= 'single-opportunity' and span[contains(text(), 'United Kingdom') or contains(text(), 'UNITED KINGDOM')]]]
This has the desired effect (which, unlike your example above, is sorted correctly):
* [ IT Technical 2 United Kingdom ](https://career.camlingroup.com/careers/opportunities/tpo-111901-it-technical-product-owner)
* [ IT Technical 3 UNITED KINGDOM ](https://career.camlingroup.com/careers/opportunities/tpo-111901-it-technical-product-owner)
* [ IT Technical Product Owner United Kingdom ](https://career.camlingroup.com/careers/opportunities/tpo-111901-it-technical-product-owner)
Alternatively you can insert a re.sub filter to modify the HTML to add a <br> after each <a> element (<a /> for XHTML):
filter:
- xpath: //*[@class= 'single-opportunity' and span[contains(text(), 'United Kingdom') or contains(text(), 'UNITED KINGDOM')]]
- re.sub:
pattern: </a>
repl: </a><br>
- re.sub:
pattern: <a />
repl: <a /><br />