Added an API for working with wheel files
Closes #697.
Note to self: remember to take into account older versions of specs (https://github.com/pypa/wheel/issues/440#issuecomment-1010953153) when normalizing/validating wheel names.
How would one support prepare_metadata_for_build_wheel using this? I had functionality to only write out the metadata dir. I guess you could write an empty wheel then copy out the .dist-info directory; is that the only way?
How would one support
prepare_metadata_for_build_wheelusing this? I had functionality to only write out the metadata dir. I guess you could write an empty wheel then copy out the .dist-info directory; is that the only way?
Separating the metadata preparation step should cover that use case. No need to jump through unnecessary hoops then.
How would one support
prepare_metadata_for_build_wheelusing this? I had functionality to only write out the metadata dir. I guess you could write an empty wheel then copy out the .dist-info directory; is that the only way?Separating the metadata preparation step should cover that use case. No need to jump through unnecessary hoops then.
What changes would that actually entail? From what I can see, only write_metadata() would need to be separated into a free function that would take the package base name and the metadata version as its arguments, yes? Anything else?
Hello there 👋 I'm the person who had the bright idea to build this jankyness back in the day, and now has to maintain it for the sake of all internal APIs we've built with it in the company I work for :).
It would be great to have one method for writing both files and directories. This way, if there's a list of paths to include (e.g. from a configuration file), the code doesn't have to perform the if os.path.isdir(path): dance. It also adds nice symmetry for people who would use this in conjunction with tarfile.TarFile.add to create a PEP-517 builder.
Out of curiosity: what's the reason behind the split between WheelReader and WheelWriter? I.e. why not just have a single class with different modes of operation? I realize that doing it this way makes the code far simpler, but is there any other reason, perhaps related to IO considerations?
It would be great to have one method for writing both files and directories. This way, if there's a list of paths to include (e.g. from a configuration file), the code doesn't have to perform the
if os.path.isdir(path):dance.
Can you elaborate on this? Only write_files_from_directory() uses Path.is_dir(), but I don't see how that's a problem.
It also adds nice symmetry for people who would use this in conjunction with
tarfile.TarFile.addto create a PEP-517 builder.
I frankly don't understand what tar files have to do with wheels.
Can you elaborate on this? Only write_files_from_directory() uses Path.is_dir(), but I don't see how that's a problem.
It is the API user that will have to Path.is_dir() each path, and that's my point.
I would like to do a WheelWriter.write(path) without having to worry whether its a directory or a file. Right now I have to perform a check first, only then I'll know whether I can use write_files_from_directory or just write_file. Or am I missing something here?
Since you are already performing that check in write_files_from_directory, why not make a write method that can take in both files and directories?
I frankly don't understand what tar files have to do with wheels.
PEP-517 defines two hooks: one for bdists, and one for sdists. I can TarFile.add a path to an sdist, without having to check whether it's a dir or not, yet I cannot WheelWriter.write it to a wheel. It's a minor thing, but if someone wants a builder for both sdists and bdists, it makes the code cleaner when whatever paths are chosen to be added can be treated similarly by both hooks.
I would like to do a WheelWriter.write(path) without having to worry whether its a directory or a file
Can you elaborate on your use case? Why can you not use write_files_from_directory()?
Why can you not use write_files_from_directory()?
Because it will throw a WheelError if the user input results in a path to a file being given to it, e.g. when adding files from a list of paths. I need to check each path and switch to write_file if that's the case, which is a pattern that the API should cover for me. I'm questioning the ergonomics of having to choose between those two methods at runtime. The API should be liberal in what it accepts without me having to add the is_dir() boilerplate.
I'm unsure what there is to elaborate further. I feel like I'm wasting everyone's time with a relatively minor point.
The code is already making the necessary check in write_files_from_directory, why not just write_file the path there instead of raising the WheelError(f"{basedir} is not a directory")? All that is left afterwards is renaming it to something more generic, e.g. write.
I'm unsure what there is to elaborate further. I feel like I'm wasting everyone's time with a relatively minor point.
I've been asking about your use case and you haven't been answering my question, so I will keep asking until you do.
What reason do you have to use the API to write individual files/directories to the wheel? That's what I've been wanting to find out. Are the contents coming from outside the file system?
:) I thought this was good enough: writing to a wheel based on a list of paths. We are talking about generic mechanic, so I gave a generic use case.
This should come down to API design principles, not to whether I'm capable of coming up with examples on the level of specificity you expect, but let me try:
A dead-simple build backend that creates wheels by packaging only the paths given by the user, as they are specified inside a list in one of pyproject.toml tables: ["file.py", "some_src_dir", "some_other_dir/data.bin"].
Speaking personally, I don't want to have to care whether each path is a directory/symlink/file. That's all there is to this. Really. I don't need 2 (or, god forbid, 3) methods for this, and I don't want to have to choose between them based on my own path checks - I'd like the API to take care of the checking and choosing what's right for each path.
What reason do you have to use the API to write individual files/directories to the wheel? That's what I've been wanting to find out. Are the contents coming from outside the file system?
It's akin to asking "what reason do you have to use the TarFile to write individual files/directories to a tarball". Lots! :wink:. Maybe we are talking past each other - I still want the directories to be walked through and recursively added to the wheel. I just don't want to have to check and choose different method for each path. It's just one if path.is_dir(): ... else ... clause less for the users of this API. :smile:
There's no esoteric use case here. It's just a trivial point about the ergonomics of this API. I don't want to bikeshed this any further.
My point is that if you're unable to come up with any practical use case for making this change, then why should I do it? It's hardly economical for write() to check if the argument is a file or directory, given the presence of write_files_from_directory(), as ithe former involves an extra system call. For more esoteric use cases, it can also accept a bytestring or unicode string, or even an open file object.
My point is that if you're unable to come up with any practical use case for making this change, then why should I do it?
I gave you a practical use case. I already use this to help QA folks package their Robot Framework sources into wheels. They do not necessarily have a good understanding of how Python build systems work, so having an explicit list of paths helps a lot.
It's hardly economical for write() to check if the argument is a file or directory, given the presence of write_files_from_directory(), as ithe former involves an extra system call.
If the cost of adding one syscall to write_files_from_directory() is too high, it would be far more productive for me and you if you just led with that. I was honestly second guessing whether we are talking about the same thing or whether I'm understanding the objectives of this API.
I don't think it's a good argument. write_files_from_directory() already does that syscall on L520, it just doesn't do anything useful with that check apart from bouncing the API user if they forget to check if the path is actually a directory.
Further, current semantics mean that the code using this API has to perform this check twice: once to figure out which method to use, then, again inside write_files_from_directory(). So it's actually less economical this way.
Why not just make that method do both and have one less sharp edge?
@agronholm what's draft about this PR? Are you looking for feedback on the API first?
@agronholm what's draft about this PR? Are you looking for feedback on the API first?
Yes, I'd very much like feedback from potential downstream users to ensure that the API is comfortable enough for them to adopt into their own projects.
Is there any documentation of the new API? I don't really want to read the implementation to review the API, and as far as I can see what docs will exist in the final build are autogenerated, but there are no docstrings. So if I look at https://packaging--805.org.readthedocs.build/en/805/wheelfile.html, it's empty.
Is there any documentation of the new API? I don't really want to read the implementation to review the API, and as far as I can see what docs will exist in the final build are autogenerated, but there are no docstrings. So if I look at https://packaging--805.org.readthedocs.build/en/805/wheelfile.html, it's empty.
Fair enough, I suppose. Then it would be sufficient at this juncture to add some docstrings and then have downstream project reps look at it?