internetarchive icon indicating copy to clipboard operation
internetarchive copied to clipboard

Files with colons in the name do not save to disk after downloading

Open dsdude123 opened this issue 1 year ago • 2 comments

While trying to download this collection for The Computer Chronicles, I found that a number of files were not saving to the disk. I tried redownloading and observed in the console that it was indeed downloading normally but once complete the resulting file was 0 bytes length and had no extension.

I went to manually download these files in my browser and noticed a common trend among the affected files is that they had a colon in the name. This probably is a Windows specific issue as that character isn't allowed in file names and my browser when saving the file replaced it with an underscore.

Collection: https://archive.org/details/Computer_Chronicles

OS: Windows Server 2019 Datacenter Python: 3.7.16

dsdude123 avatar May 29 '23 04:05 dsdude123

This probably is a Windows specific issue as that character isn't allowed in file names and my browser when saving the file replaced it with an underscore.

That is correct. It is related to the colon which is not allowed in file or directory names by Windows. The colon is used by Windows as a character to create the so called "Alternate Data Streams".

maxz avatar May 30 '23 13:05 maxz

Is there a solution/workaround for this? There are some pretty big collections I want to download but most of the filenames have colons in them and result in 0 byte files, but have the data in 'Size on disk' instead image

pinkderg avatar Jun 29 '24 16:06 pinkderg