metafacture-core icon indicating copy to clipboard operation
metafacture-core copied to clipboard

Add SitemapReader originally developed in OERSI

Open fsteeg opened this issue 3 years ago • 2 comments

Reads sitemap from URL, sends each loc URL to the receiver.

e.g. "https://hoou.de/sitemap.xml" | read-sitemap | open-http ... in a Flux workflow to process every document linked in the sitemap.

Supports paging via from= query string parameter in the sitemap URL.

Assigning @dr0i for code review due to the (albeit loose) paging relation to #464.

We don't have a dedicated issue for this, maybe @TobiasNx could do functional review here?

fsteeg avatar Sep 22 '22 15:09 fsteeg

Discussed in our planning meeting: we're putting this on hold to investigate if we actually need this kind of specific module for reading sitemaps, or if we can build something based on existing modules and the upcoming paging support (https://github.com/metafacture/metafacture-core/issues/464).

fsteeg avatar Sep 23 '22 12:09 fsteeg

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

sonarqubecloud[bot] avatar Mar 03 '23 16:03 sonarqubecloud[bot]