internetarchive
internetarchive copied to clipboard
upload: Expand a period to the contents of the current directory.
The previous behavior when specifying a period as a wildcard was faulty. Now a period is properly expanded to the contents of the current directory.
ia_upload.py seemed to be the most sensible location for this fix, since it is the closest to receiving the path and so that the behavior of the library does not change.
Closes #220, fixes #535.
This breaks when multiple files/directories are provided. If the dot is the first one, all others would get ignored. And if it isn't the first one, it would not get expanded.
Yes, I know. It seemed to be the best compromise here.
I deemed it to be unlikely that someone would want to use a dot and explicit paths in the same command.
But I could change it to merge the expansion of a dot in any position with the rest of the list if we would rather want that behaviour. I thought a bit about it when implementing it and found that solution to be a bit strange.
The period is a shortcut to upload the content of the current directory (and thereby its sub-directories). If someone wanted to upload the content of multiple directories to the same item it would be strange if they ran the command in one of multiple equivalent directories and used the dot for that one directory when they had to supply the pathes of all the others anyways. And specifying an additional file in the directory which is already covered by the period expansion obviously does not make any sense either.
I changed the behaviour to fix those points. The described scenarios do not make much sense to me, but I guess it would cause more requests and trouble to deny people from using it that way than to just allow it.
I suppose I wouldn't mind much if it was only supported when the dot is the only file parameter. But ignoring the others etc. is weird.
The new code would still change the order in which files are processed if the dot isn't the first file, which might lead to confusion but might be acceptable. Otherwise:
while '.' in local_file:
i = local_file.index('.')
local_file[i : i+1] = os.listdir('.')
(Which would also handle the weird edge case of someone specifying the current directory twice.)
Well, specifying a dot together with files is already an edge case in my opinion.
I don't think that allowing for the same files to be twice in the list is sensible, especially if --checksum
is not specified.
We should really protect the users from themselves in that case.
I could see the merit of preserving the relative order of the first period and dropping any other periods (basically changing the while in your example to an if and deleting additional periods). But since the period is already expanded to not explicitly mentioned components, I don't see real additional potential for confusion when not preserving the order.
I don't see how that is any different from someone accidentally doing ia upload $identifier dir dir
. I would expect it to upload things twice in that case (cf. tar -c file file
producing output containing the file twice, wget URL URL
downloading it twice, etc., none of which make much sense but might be reasonable in weird circumstances). A generic duplicate detection might be worthwhile, but it should definitely emit a warning in my opinion.
Yes, it would probably be best to skip duplicate paths.
I don't see any merit to uploading the same file twice as part of the same operation. I can't think of any sensible case for allowing this.
To properly add the duplicate detection (in item.py
) is too much of an undertaking for me right now, so that's for someone else or a later time.
@maxz looking good to me, is this ready to merge?
Yes, @jjjake.