youtube icon indicating copy to clipboard operation
youtube copied to clipboard

Playlist scraping does not "slug" properly in the ZIM filenames

Open kelson42 opened this issue 2 years ago • 4 comments

Look at how the ZIM filenames of https://farm.openzim.org/recipes/zenius_id_playlists look like: https://farm.openzim.org/pipeline/46001c89460732f50a144a36

They don't respect the norm, and actually I suspect that if the playlist name has a character which is not supported by the filesystem then it will crash.

kelson42 avatar Dec 22 '22 13:12 kelson42

The recipe is setting the Name via playlists-name using a dynamic variable. This is likely not a very good Name and definitely not convention OK ; but this is a known and tricky problem regarding that specific YT feature

I suppose we could slug whatever value we have when setting the filename (even if a filename was passed) ; in all scrapers (not just this one). that would be safer and mostly harmless.

And yes, we'll probably not encounter much filename issue when creating as we're on ext4 and xfs but users on Windows for instance might have difficulties downloading (I suppose the browser fixes it but I don't really know)

rgaudin avatar Dec 22 '22 13:12 rgaudin

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

stale[bot] avatar May 26 '23 16:05 stale[bot]

@kelson42 now that we've agreed we will stop to encourage using dynamic metadata and will even stop using it in our Zimfarm, is there really something left to be done here?

benoit74 avatar Jun 15 '24 12:06 benoit74

@benoit74 Not sure myself, but I would recommend to secure that file(name) can be written to the fs in all playlist scenarios.

kelson42 avatar Jun 15 '24 12:06 kelson42

Look at how the ZIM filenames of https://farm.openzim.org/recipes/zenius_id_playlists look like: https://farm.openzim.org/pipeline/46001c89460732f50a144a36

@kelson42 Could you please clarify what needs to be done here? The links provided above seem to be broken, so it's a bit unclear what needs to be fixed.

If the issue is with the ZIM filename, we are already using validate_zimfile_creatable from python-scraperlib to ensure the ZIM file is creatable with the given name.

dan-niles avatar Aug 20 '24 14:08 dan-niles

@dan-niles This was a playlist based recipe and the bug is very old.To me the situation is not that clear but we don't need this feature anyway and this woujd be hard to reproduce it - if possible at all. I will clise it.

kelson42 avatar Aug 20 '24 15:08 kelson42