python-scraperlib icon indicating copy to clipboard operation
python-scraperlib copied to clipboard

Add utility function to compute ZIM Tags

Open benoit74 opened this issue 1 year ago • 0 comments

All scrapers are setting ZIM tags based on a user-provided string with semi-colon separator between values (or at least they should).

Some scrapers are also setting few tags automatically, in addition to the user-provided tags.

This list of tags should be de-duplicated and tags provided by user should be trimmed from any leading / trailing whitespace.

Having a utility function at zimscraperlib level to share this logic would help avoid reinventing the wheel over and over again. This function would take two parameters: default_tags (list of str) and user_tags (str) and return a list of tags ready to be passed to the creator (or a set? would be better if the creator supports passing a set, to be checked at validate_tags and libzim levels).

warc2zim is going to have what looks like a promising implementation (after https://github.com/openzim/warc2zim/pull/267 is merged).

benoit74 avatar May 24 '24 06:05 benoit74