ttrss_plugin-feediron icon indicating copy to clipboard operation
ttrss_plugin-feediron copied to clipboard

Fix Recursive fetch after new reformat option

Open monofox opened this issue 9 months ago • 0 comments

After integration of feediron/ttrss_plugin-feediron#199 for having feature to reformat found subsequent article links, recursive multipage handling mode is failing.

This commit fixes the recursive loop.

Fixes feediron/ttrss_plugin-feediron#201

Please answer the following questions for yourself before submitting a pull request. YOU MAY DELETE UNUSED SECTIONS.

NOTICE!!!

All rule submissions should be done in the https://github.com/feediron/feediron-recipes repository.

Bugfix/Enhancement

  • [x] Have you added an explanation of what your changes do and why you'd like us to include them?
  • [x] Have you successfully ran tests with your changes locally?

Tests executed

Test 1

Configuration:

{
        "type": "xpath",
        "xpath": "div[contains(@class, 'article-content')]",
        "multipage": {
            "xpath": "nav[contains(@class, 'page-numbers')]\/span\/a[last()]",
            "append": true,
            "recursive": true
        },
        "modify": [
            {
                "type": "regex",
                "pattern": "\/<li.*? data-src=\"(.*?)\".*?>\\s*<figure.*?>.*?(?:<figcaption.*?<div class=\"caption\">(.*?)<\\\/div>.*?<\\\/figcaption>)?\\s*<\\\/figure>\\s*<\\\/li>\/s",
                "replace": "<figure><img src=\"$1\"\/><figcaption>$2<\/figcaption><\/figure>"
            }
        ],
        "cleanup": [
            "aside",
            "div[contains(@class, 'sidebar')]"
        ]
    }

Testurl: https://arstechnica.com/gadgets/2024/05/all-the-ways-streaming-services-are-aggravating-their-subscribers-this-week/

Purpose: ensure, that recursive multipage handling is working with disabled reformat.

Test 2

Configuration:

{
    "type": "xpath",
    "xpath": "article",
    "tags": {
        "type": "xpath",
        "xpath": "meta[@name='keywords']",
        "split": ",",
        "modify": [
            {
                "type": "replace",
                "search": "\"\/>",
                "replace": ""
            }
        ]
    },
    "cleanup": [
        "amp-analytics",
        "amp-consent",
        "amp-pixel",
        "amp-ad",
        "header",
        "amp-font",
        "a[@class='link-to-top']",
        "div[contains(@class ,'amp-ad-container')]",
        "div[contains(@class ,'social-sticky')]",
        "footer",
        "aside[@id='job-market']",
        "aside[@class='aside__meta']",
        "ul[contains(@class, 'social-tools')]",
        "ol[@class='list-pages']",
        "div[@amp-access='NOT subscriber' and text() = 'Anzeige']"
    ],
    "multipage": {
        "xpath": "ol[@class='list-pages' and not(@id='atoc_line')]\/li\/a[text() != '\u203a']",
        "append": true,
        "reformat": true
    },
    "reformat": [
        {
            "type": "regex",
            "pattern": "\/\\.html$\/",
            "replace": ".amp.html"
        }
    ]
}

Testurl: https://www.golem.de/news/sony-ult-wear-im-vergleichstest-ein-erschwinglicher-kopfhoerer-der-begeistert-2405-184690.html

Purpose: Ensure, that reformat works in a non-recursive mode (all links are found and reformatted).

Test 2

Configuration:

{
    "type": "xpath",
    "xpath": "article",
    "tags": {
        "type": "xpath",
        "xpath": "meta[@name='keywords']",
        "split": ",",
        "modify": [
            {
                "type": "replace",
                "search": "\"\/>",
                "replace": ""
            }
        ]
    },
    "cleanup": [
        "amp-analytics",
        "amp-consent",
        "amp-pixel",
        "amp-ad",
        "header",
        "amp-font",
        "a[@class='link-to-top']",
        "div[contains(@class ,'amp-ad-container')]",
        "div[contains(@class ,'social-sticky')]",
        "footer",
        "aside[@id='job-market']",
        "aside[@class='aside__meta']",
        "ul[contains(@class, 'social-tools')]",
        "ol[@class='list-pages']",
        "div[@amp-access='NOT subscriber' and text() = 'Anzeige']"
    ],
    "multipage": {
        "xpath": "ol[@class='list-pages' and not(@id='atoc_line')]\/li\/a[text() != '\u203a']",
        "append": true,
        "recursive": true,
        "reformat": true
    },
    "reformat": [
        {
            "type": "regex",
            "pattern": "\/\\.html$\/",
            "replace": ".amp.html"
        }
    ]
}

Testurl: https://www.golem.de/news/sony-ult-wear-im-vergleichstest-ein-erschwinglicher-kopfhoerer-der-begeistert-2405-184690.html

Purpose: Ensure, that reformat works in a recursive mode (while all links are found again on same page).

Fixes #201

Proposed Changes

monofox avatar May 03 '24 19:05 monofox