selectolax
selectolax copied to clipboard
Using LexborHTMLParser seems to remove some HTML tags
Hello there Mr.Selectolax :)
I have been using selectolax for a very long time and I do really like it and will continue using it. I have found a small issue where I seem to get a return of html with removed tags:
from selectolax.lexbor import LexborHTMLParser
html_test = """
<tr class="clickable" data-price="1800">
<td>
<img width="80" src="https://media.restocks.net/products/DD1869-103/nike-dunk-high-black-white-w-1-80.png"/>
</td>
<td>
<input class="productid" type="hidden" value="1882303"/>
<input class="baseproductid" type="hidden" value="12107"/>
<input class="sizeid" type="hidden" value="1"/>
<input class="price" type="hidden" value="1800"/>
<span>Nike Dunk High Black White Panda (W)</span>
<br/>
EU: 36
<br/>
ID: 1882303
<br/>
Ship before:
13/05/22
</td>
<td>
<span class="storeprice ">
<span class="storeprice__value">1.800 kr</span>
</span>
</td>
<td>
<div onclick="window.open('https://restocks.net/en/account/sales/send-label/1882303')" class="download__send__label c-badge c-badge--small pull-right" style="background-color: #df9033">download shipping label</div>
</td>
<td>
<i class="fas fa-pencil-alt listing__edit__icon"></i></span>
</td>
</tr>
<tr class="clickable" data-price="4000">
<td>
<img width="80" src="https://media.restocks.net/products/DC0774-114/air-jordan-1-low-marina-blue-w-1-80.png"/>
</td>
<td>
<input class="productid" type="hidden" value="1882293"/>
<input class="baseproductid" type="hidden" value="13815"/>
<input class="sizeid" type="hidden" value="48"/>
<input class="price" type="hidden" value="4000"/>
<span>Air Jordan 1 Low Marina Blue (W)</span>
<br/>
EU: 38 ½
<br/>
ID: 1882293
<br/>
Ship before:
13/05/22
</td>
<td>
<span class="storeprice ">
<span class="storeprice__value">4.000 kr</span>
</span>
</td>
<td>
<div onclick="window.open('https://restocks.net/en/account/sales/send-label/1882293')" class="download__send__label c-badge c-badge--small pull-right" style="background-color: #df9033">download shipping label</div>
</td>
<td>
<i class="fas fa-pencil-alt listing__edit__icon"></i></span>
</td>
</tr>
<tr class="clickable" data-price="4000">
<td>
<img width="80" src="https://media.restocks.net/products/DC0774-114/air-jordan-1-low-marina-blue-w-1-80.png"/>
</td>
<td>
<input class="productid" type="hidden" value="1882294"/>
<input class="baseproductid" type="hidden" value="13815"/>
<input class="sizeid" type="hidden" value="48"/>
<input class="price" type="hidden" value="4000"/>
<span>Air Jordan 1 Low Marina Blue (W)</span>
<br/>
EU: 38 ½
<br/>
ID: 1882294
<br/>
Ship before:
13/05/22
</td>
<td>
<span class="storeprice ">
<span class="storeprice__value">4.000 kr</span>
</span>
</td>
<td>
<div onclick="window.open('https://restocks.net/en/account/sales/send-label/1882294')" class="download__send__label c-badge c-badge--small pull-right" style="background-color: #df9033">download shipping label</div>
</td>
<td>
<i class="fas fa-pencil-alt listing__edit__icon"></i></span>
</td>
</tr>
<tr class="clickable" data-price="4000">
<td>
<img width="80" src="https://media.restocks.net/products/DC0774-114/air-jordan-1-low-marina-blue-w-1-80.png"/>
</td>
<td>
<input class="productid" type="hidden" value="1882295"/>
<input class="baseproductid" type="hidden" value="13815"/>
<input class="sizeid" type="hidden" value="4"/>
<input class="price" type="hidden" value="4000"/>
<span>Air Jordan 1 Low Marina Blue (W)</span>
<br/>
EU: 39
<br/>
ID: 1882295
<br/>
Ship before:
13/05/22
</td>
<td>
<span class="storeprice ">
<span class="storeprice__value">4.000 kr</span>
</span>
</td>
<td>
<div onclick="window.open('https://restocks.net/en/account/sales/send-label/1882295')" class="download__send__label c-badge c-badge--small pull-right" style="background-color: #df9033">download shipping label</div>
</td>
<td>
<i class="fas fa-pencil-alt listing__edit__icon"></i></span>
</td>
</tr>
<tr class="clickable" data-price="4000">
<td>
<img width="80" src="https://media.restocks.net/products/DC0774-114/air-jordan-1-low-marina-blue-w-1-80.png"/>
</td>
<td>
<input class="productid" type="hidden" value="1882296"/>
<input class="baseproductid" type="hidden" value="13815"/>
<input class="sizeid" type="hidden" value="4"/>
<input class="price" type="hidden" value="4000"/>
<span>Air Jordan 1 Low Marina Blue (W)</span>
<br/>
EU: 39
<br/>
ID: 1882296
<br/>
Ship before:
13/05/22
</td>
<td>
<span class="storeprice ">
<span class="storeprice__value">4.000 kr</span>
</span>
</td>
<td>
<div onclick="window.open('https://restocks.net/en/account/sales/send-label/1882296')" class="download__send__label c-badge c-badge--small pull-right" style="background-color: #df9033">download shipping label</div>
</td>
<td>
<i class="fas fa-pencil-alt listing__edit__icon"></i></span>
</td>
</tr>
<tr class="clickable" data-price="1630">
<td>
<img width="80" src="https://media.restocks.net/products/DC0774-114/air-jordan-1-low-marina-blue-w-1-80.png"/>
</td>
<td>
<input class="productid" type="hidden" value="1882297"/>
<input class="baseproductid" type="hidden" value="13815"/>
<input class="sizeid" type="hidden" value="5"/>
<input class="price" type="hidden" value="1630"/>
<span>Air Jordan 1 Low Marina Blue (W)</span>
<br/>
EU: 40
<br/>
ID: 1882297
<br/>
Ship before:
13/05/22
</td>
<td>
<span class="storeprice ">
<span class="storeprice__value">1.630 kr</span>
</span>
</td>
<td>
<div onclick="window.open('https://restocks.net/en/account/sales/send-label/1882297')" class="download__send__label c-badge c-badge--small pull-right" style="background-color: #df9033">download shipping label</div>
</td>
<td>
<i class="fas fa-pencil-alt listing__edit__icon"></i></span>
</td>
</tr>
<tr class="clickable" data-price="4000">
<td>
<img width="80" src="https://media.restocks.net/products/DC0774-114/air-jordan-1-low-marina-blue-w-1-80.png"/>
</td>
<td>
<input class="productid" type="hidden" value="1882288"/>
<input class="baseproductid" type="hidden" value="13815"/>
<input class="sizeid" type="hidden" value="1"/>
<input class="price" type="hidden" value="4000"/>
<span>Air Jordan 1 Low Marina Blue (W)</span>
<br/>
EU: 36
<br/>
ID: 1882288
<br/>
Ship before:
13/05/22
</td>
<td>
<span class="storeprice ">
<span class="storeprice__value">4.000 kr</span>
</span>
</td>
<td>
<div onclick="window.open('https://restocks.net/en/account/sales/send-label/1882288')" class="download__send__label c-badge c-badge--small pull-right" style="background-color: #df9033">download shipping label</div>
</td>
<td>
<i class="fas fa-pencil-alt listing__edit__icon"></i></span>
</td>
</tr>
<tr class="clickable" data-price="4000">
<td>
<img width="80" src="https://media.restocks.net/products/DC0774-114/air-jordan-1-low-marina-blue-w-1-80.png"/>
</td>
<td>
<input class="productid" type="hidden" value="1882289"/>
<input class="baseproductid" type="hidden" value="13815"/>
<input class="sizeid" type="hidden" value="13"/>
<input class="price" type="hidden" value="4000"/>
<span>Air Jordan 1 Low Marina Blue (W)</span>
<br/>
EU: 36 ½
<br/>
ID: 1882289
<br/>
Ship before:
13/05/22
</td>
<td>
<span class="storeprice ">
<span class="storeprice__value">4.000 kr</span>
</span>
</td>
<td>
<div onclick="window.open('https://restocks.net/en/account/sales/send-label/1882289')" class="download__send__label c-badge c-badge--small pull-right" style="background-color: #df9033">download shipping label</div>
</td>
<td>
<i class="fas fa-pencil-alt listing__edit__icon"></i></span>
</td>
</tr>
<tr class="clickable" data-price="4000">
<td>
<img width="80" src="https://media.restocks.net/products/DC0774-114/air-jordan-1-low-marina-blue-w-1-80.png"/>
</td>
<td>
<input class="productid" type="hidden" value="1882290"/>
<input class="baseproductid" type="hidden" value="13815"/>
<input class="sizeid" type="hidden" value="44"/>
<input class="price" type="hidden" value="4000"/>
<span>Air Jordan 1 Low Marina Blue (W)</span>
<br/>
EU: 37 ½
<br/>
ID: 1882290
<br/>
Ship before:
13/05/22
</td>
<td>
<span class="storeprice ">
<span class="storeprice__value">4.000 kr</span>
</span>
</td>
<td>
<div onclick="window.open('https://restocks.net/en/account/sales/send-label/1882290')" class="download__send__label c-badge c-badge--small pull-right" style="background-color: #df9033">download shipping label</div>
</td>
<td>
<i class="fas fa-pencil-alt listing__edit__icon"></i></span>
</td>
</tr>
<tr class="clickable" data-price="4000">
<td>
<img width="80" src="https://media.restocks.net/products/DC0774-114/air-jordan-1-low-marina-blue-w-1-80.png"/>
</td>
<td>
<input class="productid" type="hidden" value="1882291"/>
<input class="baseproductid" type="hidden" value="13815"/>
<input class="sizeid" type="hidden" value="44"/>
<input class="price" type="hidden" value="4000"/>
<span>Air Jordan 1 Low Marina Blue (W)</span>
<br/>
EU: 37 ½
<br/>
ID: 1882291
<br/>
Ship before:
13/05/22
</td>
<td>
<span class="storeprice ">
<span class="storeprice__value">4.000 kr</span>
</span>
</td>
<td>
<div onclick="window.open('https://restocks.net/en/account/sales/send-label/1882291')" class="download__send__label c-badge c-badge--small pull-right" style="background-color: #df9033">download shipping label</div>
</td>
<td>
<i class="fas fa-pencil-alt listing__edit__icon"></i></span>
</td>
</tr>
<tr class="clickable" data-price="4000">
<td>
<img width="80" src="https://media.restocks.net/products/DC0774-114/air-jordan-1-low-marina-blue-w-1-80.png"/>
</td>
<td>
<input class="productid" type="hidden" value="1882292"/>
<input class="baseproductid" type="hidden" value="13815"/>
<input class="sizeid" type="hidden" value="3"/>
<input class="price" type="hidden" value="4000"/>
<span>Air Jordan 1 Low Marina Blue (W)</span>
<br/>
EU: 38
<br/>
ID: 1882292
<br/>
Ship before:
13/05/22
</td>
<td>
<span class="storeprice ">
<span class="storeprice__value">4.000 kr</span>
</span>
</td>
<td>
<div onclick="window.open('https://restocks.net/en/account/sales/send-label/1882292')" class="download__send__label c-badge c-badge--small pull-right" style="background-color: #df9033">download shipping label</div>
</td>
<td>
<i class="fas fa-pencil-alt listing__edit__icon"></i></span>
</td>
</tr>
"""
doc = LexborHTMLParser(html_test)
print(doc.html)
When running the example, we do not see the <tr class="clickable" anymore and is removed which shouldn't happen. I wonder if you could look at why it does it?
I think that's because there is no <table>.
https://github.com/rushter/selectolax/issues/2#issuecomment-355850317
I think that's because there is no
<table>. #2 (comment)
Oh I see! Any suggestions on what I can do to be able to scrape the <tr> in that case? The reason is that doing a GET on a webpage that I am using, the HTML is actually the whole output as I showed previously here so I would really like to know if there is any chance at all or maybe I should try with another parser?
If you know when you have such HTML, just wrap it with <table> content </table>