bulk-downloader-for-reddit icon indicating copy to clipboard operation
bulk-downloader-for-reddit copied to clipboard

Fix: Submissions with control character in title cannot be downloaded

Open thomas694 opened this issue 2 years ago • 0 comments

bdfr archive --subreddit Unicode --sort new Z:/Reddit fails on a windows system.

If a submission contains an invalid character in a field used for the filename the file cannot be saved and an exception occurs:

[2023-02-12 17:10:13,830 - root - ERROR] - Archiver exited unexpectedly
Traceback (most recent call last):
  File "Z:\bulk-downloader-for-reddit\bdfr\__main__.py", line 143, in cli_archive
    reddit_archiver.download()
  File "Z:\bulk-downloader-for-reddit\bdfr\archiver.py", line 60, in download
    self.write_entry(submission)
  File "Z:\bulk-downloader-for-reddit\bdfr\archiver.py", line 120, in write_entry
    self._write_entry_json(entry, content, hash)
  File "Z:\bulk-downloader-for-reddit\bdfr\archiver.py", line 132, in _write_entry_json
    self._write_content_to_disk(resource, content, hash)
  File "Z:\bulk-downloader-for-reddit\bdfr\archiver.py", line 167, in _write_content_to_disk
    with Path(file_path).open(mode="w", encoding="utf-8") as file:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\AppData\Local\Programs\Python\Python\Lib\pathlib.py", line 1044, in open
    return io.open(self, mode, buffering, encoding, errors, newline)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [Errno 22] Invalid argument: "Z:\\Reddit\\Unicode\\Expert-Fun-2444_Unicode that doesn't exist \x03_vm3p1h.json"

The submission's title contains a control character. It either needs to be replaced or according to the already existing style to be removed.

The provided fix does the latter.

thomas694 avatar Feb 12 '23 16:02 thomas694