artoo icon indicating copy to clipboard operation
artoo copied to clipboard

How to ignore <div> inside <td>

Open jonathanyee opened this issue 8 years ago • 3 comments

I'm using the scrapTable function and its working fine, however, the site I'm scraping has an extra div inside a cell. How can I ignore this div <div class="nonTablet nonDesktop"><b>12/27/2015</b></div>?

<table class="ledger accountDetail">
    <thead class="nonMobile">
        <tr>
            <th style="width: 14%;">Date</th>
            <th style="width: 44%;">Description</th>
            <th style="width: 17%;" class="right">Debits&nbsp;$ / Credits&nbsp;$</th>
            <th style="width: 25%;" class="right">Current Balance&nbsp;$</th>
        </tr>
    </thead>
    <tbody>
        <tr class="bkgd2">
            <td class="ledgerAccountDetailDesc" style="white-space: nowrap;">12/27/2015</td>
            <td class="ledgerAccountDetailDesc">
                <div class="nonTablet nonDesktop"><b>12/27/2015</b></div>
                Interest Payment	
            </td>

jonathanyee avatar Apr 25 '17 07:04 jonathanyee

I am not sure to understand what you mean @jonathanyee. If you just want to retrieve the contained text, why not use the .text method?

Yomguithereal avatar Apr 25 '17 07:04 Yomguithereal

So when I use the scrapTable function, it parses the date twice. Once from the <td> and once from the <div> inside the next <td>. So I'm wondering how I could ignore the <div> since I dont need the date twice.

jonathanyee avatar Apr 25 '17 18:04 jonathanyee

I guess at that point you can either use scrapeTable and post-process the created list to drop the unnecessary fields or rather use scrape to do the job with less sugar but more control on the output.

Yomguithereal avatar Apr 26 '17 09:04 Yomguithereal