youtube
youtube copied to clipboard
Playlist scraping does not "slug" properly in the ZIM filenames
Look at how the ZIM filenames of https://farm.openzim.org/recipes/zenius_id_playlists look like: https://farm.openzim.org/pipeline/46001c89460732f50a144a36
They don't respect the norm, and actually I suspect that if the playlist name has a character which is not supported by the filesystem then it will crash.
The recipe is setting the Name via playlists-name using a dynamic variable. This is likely not a very good Name and definitely not convention OK ; but this is a known and tricky problem regarding that specific YT feature
I suppose we could slug whatever value we have when setting the filename (even if a filename was passed) ; in all scrapers (not just this one). that would be safer and mostly harmless.
And yes, we'll probably not encounter much filename issue when creating as we're on ext4 and xfs but users on Windows for instance might have difficulties downloading (I suppose the browser fixes it but I don't really know)
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
@kelson42 now that we've agreed we will stop to encourage using dynamic metadata and will even stop using it in our Zimfarm, is there really something left to be done here?
@benoit74 Not sure myself, but I would recommend to secure that file(name) can be written to the fs in all playlist scenarios.
Look at how the ZIM filenames of https://farm.openzim.org/recipes/zenius_id_playlists look like: https://farm.openzim.org/pipeline/46001c89460732f50a144a36
@kelson42 Could you please clarify what needs to be done here? The links provided above seem to be broken, so it's a bit unclear what needs to be fixed.
If the issue is with the ZIM filename, we are already using validate_zimfile_creatable
from python-scraperlib to ensure the ZIM file is creatable with the given name.
@dan-niles This was a playlist based recipe and the bug is very old.To me the situation is not that clear but we don't need this feature anyway and this woujd be hard to reproduce it - if possible at all. I will clise it.