ttrss_plugin-feediron icon indicating copy to clipboard operation
ttrss_plugin-feediron copied to clipboard

Add xpath all items switch to tags filter

Open benfishbus opened this issue 2 years ago • 6 comments

I hope this is not a really stupid question, but I just encountered a site that provides some tags that ttrss is not pulling in. I want to use the tags filter to import them, but it's not clear to me where it belongs in the config. I've only used the xpath and readability filters before.

Is the tags filter even applied per domain/URL, or is it global?

I searched for example recipes and could not find any...

benfishbus avatar May 06 '22 17:05 benfishbus

Is the tags filter even applied per domain/URL, or is it global?

Tag fetching is a per url filter - see the documentation here: https://github.com/feediron/ttrss_plugin-feediron#tags-filter

A full config example is like this:

{
        "type": "xpath",
        "tidy-source": true,
        "xpath": [
            "div[contains(@class,'entry-content')]"
        ]
        "tags": {
            "type": "xpath",
            "replace-tags":true,
            "xpath": [
                "p[@class='topics']\/\/text()"
            ],
            "split":","
        }
}

Tag filtering behaves very similar to the xpath filtering but it attempts to strip out any remaining html to get plain text. for best results it's best to xpath directly to body text if possible as in the example above.

dugite-code avatar May 09 '22 09:05 dugite-code

I'm still learning xpath. How would I select all of the tags in this html:

<div class="grid-ten grid-prepend-two large-grid-nine grid-last content-topics topic-list">
      <i class="icon-tag"></i>
      <ul>
          <li class="topic-list-item">
            <a href="/topics/censorship-208">Censorship</a>
          </li>
          <li class="topic-list-item">
            <a href="/topics/bbc-3648">BBC</a>
          </li>
          <li class="topic-list-item">
            <a href="/topics/broadcasting-4655">Broadcasting</a>
          </li>
          <li class="topic-list-item">
            <a href="/topics/nuclear-10888">Nuclear</a>
          </li>
          <li class="topic-list-item">
            <a href="/topics/harold-wilson-16241">Harold Wilson</a>
          </li>
          <li class="topic-list-item">
            <a href="/topics/nuclear-war-17464">Nuclear war</a>
          </li>
      </ul>
    </div>

No matter what I try, all I get is the first one.

benfishbus avatar May 27 '22 02:05 benfishbus

"div[contains(@class, 'topic-list')]\/\/ul" gives me one giant tag that I can't split "li[@class='topic-list-item']"gives me only the first tag "\/\/a[@href[contains(.,'\/topics\/')]]" also gives me only the first tag

I don't want to resort to regex unless I have to.

benfishbus avatar May 27 '22 02:05 benfishbus

@benfishbus The tag's filter hooks into the basic xpath filter so it's stuck with the same limitations

Editing this line https://github.com/feediron/ttrss_plugin-feediron/blob/master/filters/fi_mod_tags_xpath/init.php#L19 to be:

$newtag = ( new fi_mod_all_xpath() )->perform_filter( $html, $config, $settings );

Will switch it over to the XPath Filter - All-items https://github.com/feediron/ttrss_plugin-feediron/tree/master/filters/fi_mod_all_xpath that I've classed as experimental for quite a while now but should be ok to use. You should be able to define the join_element https://github.com/feediron/ttrss_plugin-feediron/tree/master/filters/fi_mod_xpath#join_element---join_elementstr and then split the tags with that join_element

dugite-code avatar May 30 '22 02:05 dugite-code

I might make this a switch for the tag filter. Would be a good enhancment

dugite-code avatar May 30 '22 02:05 dugite-code

Interesting. I had stumbled across this experimental filter in my testing, and it didn't work for me. I assumed it could be invoked within the "tags" filter as "type": "all_xpath".

After making that edit to Line 19, this now works:

    "tags": {
        "type": "xpath",
        "replace-tags": true,
        "xpath": [
            {
                "xpath": "*[@class='topic-list-item']",
                "index": "all"
            }
        ],
        "join_element": ",",
        "split": ","
    }

Thank you! 😎

benfishbus avatar May 30 '22 03:05 benfishbus