internetarchive icon indicating copy to clipboard operation
internetarchive copied to clipboard

upload: Expand a period to the contents of the current directory.

Open maxz opened this issue 2 years ago • 7 comments

The previous behavior when specifying a period as a wildcard was faulty. Now a period is properly expanded to the contents of the current directory.

ia_upload.py seemed to be the most sensible location for this fix, since it is the closest to receiving the path and so that the behavior of the library does not change.

Closes #220, fixes #535.

maxz avatar Jun 19 '22 23:06 maxz

This breaks when multiple files/directories are provided. If the dot is the first one, all others would get ignored. And if it isn't the first one, it would not get expanded.

JustAnotherArchivist avatar Jun 20 '22 00:06 JustAnotherArchivist

Yes, I know. It seemed to be the best compromise here.

I deemed it to be unlikely that someone would want to use a dot and explicit paths in the same command.

But I could change it to merge the expansion of a dot in any position with the rest of the list if we would rather want that behaviour. I thought a bit about it when implementing it and found that solution to be a bit strange.

The period is a shortcut to upload the content of the current directory (and thereby its sub-directories). If someone wanted to upload the content of multiple directories to the same item it would be strange if they ran the command in one of multiple equivalent directories and used the dot for that one directory when they had to supply the pathes of all the others anyways. And specifying an additional file in the directory which is already covered by the period expansion obviously does not make any sense either.

maxz avatar Jun 20 '22 08:06 maxz

I changed the behaviour to fix those points. The described scenarios do not make much sense to me, but I guess it would cause more requests and trouble to deny people from using it that way than to just allow it.

maxz avatar Jun 20 '22 08:06 maxz

I suppose I wouldn't mind much if it was only supported when the dot is the only file parameter. But ignoring the others etc. is weird.

The new code would still change the order in which files are processed if the dot isn't the first file, which might lead to confusion but might be acceptable. Otherwise:

while '.' in local_file:
	i = local_file.index('.')
	local_file[i : i+1] = os.listdir('.')

(Which would also handle the weird edge case of someone specifying the current directory twice.)

JustAnotherArchivist avatar Jun 20 '22 11:06 JustAnotherArchivist

Well, specifying a dot together with files is already an edge case in my opinion. I don't think that allowing for the same files to be twice in the list is sensible, especially if --checksum is not specified. We should really protect the users from themselves in that case.

I could see the merit of preserving the relative order of the first period and dropping any other periods (basically changing the while in your example to an if and deleting additional periods). But since the period is already expanded to not explicitly mentioned components, I don't see real additional potential for confusion when not preserving the order.

maxz avatar Jun 20 '22 11:06 maxz

I don't see how that is any different from someone accidentally doing ia upload $identifier dir dir. I would expect it to upload things twice in that case (cf. tar -c file file producing output containing the file twice, wget URL URL downloading it twice, etc., none of which make much sense but might be reasonable in weird circumstances). A generic duplicate detection might be worthwhile, but it should definitely emit a warning in my opinion.

JustAnotherArchivist avatar Jun 20 '22 12:06 JustAnotherArchivist

Yes, it would probably be best to skip duplicate paths.

I don't see any merit to uploading the same file twice as part of the same operation. I can't think of any sensible case for allowing this.

To properly add the duplicate detection (in item.py) is too much of an undertaking for me right now, so that's for someone else or a later time.

maxz avatar Jun 20 '22 12:06 maxz

@maxz looking good to me, is this ready to merge?

jjjake avatar Aug 15 '22 17:08 jjjake

Yes, @jjjake.

maxz avatar Aug 15 '22 17:08 maxz