leech icon indicating copy to clipboard operation
leech copied to clipboard

Handle multiple entries in next_link

Open kpedro88 opened this issue 9 months ago • 4 comments

I wanted to use next_selector on a site with the following setup:

story_part_2.html:

<a href="story_part_1.html">Previous</a>
<a href="story_part_3.html">Next</a>

i.e. no obvious attributes to distinguish the "previous" link from the "next" link.

The actual algorithm to process the next link does include loop detection, so it can disregard the "previous" link if it shows up as a match. However, only next_link[0] was checked. I reimplemented this as a recursive function that loops over all entries in next_link and stops once it finds a new one. (It could keep going, but the flow here did not make sense: having multiple new entries in next_link leads to an ambiguity, as they could be processed depth-first or breadth-first, and in any case, it seemed contrary to the logic of next_selector.)

kpedro88 avatar Mar 06 '25 03:03 kpedro88

I sort of worry that the problem domain we're in could legitimately hit python's recursion limit here. The limit should default to 1,000 deep, and I checked and the full download of A Practical Guide To Evil fetches 699 chapters. The Wandering Inn has 757 currently, so it's almost certainly actually going to reach that limit someday. So "web serials with in-the-range-of a thousand chapters" are definitely a possibility... (let's not even talk about translated Chinese xianxia).

For your specific case, could you have made the existing next_selector behavior work with some combination of :last-child or :last-of-type? I know that the markup isn't always conducive to actually getting those to cooperate...

kemayo avatar Mar 06 '25 04:03 kemayo

There are a bunch of other links (social media sharing, etc.) on the page. They can be excluded by class, but I don't think last selectors would work to find the "next" link.

I had rewritten this to use a recursive function simply because it was easier to express the desired logic, but I can convert it back to a loop if that's the primary concern.

kpedro88 avatar Mar 06 '25 23:03 kpedro88

Yeah, if this remains a loop then I think I'd be happy to merge it.

kemayo avatar Mar 07 '25 19:03 kemayo

Done (just using a FIFO approach)

kpedro88 avatar Mar 08 '25 15:03 kpedro88

Sorry, I was busy with other things for a bit.

kemayo avatar Mar 19 '25 01:03 kemayo