packaging icon indicating copy to clipboard operation
packaging copied to clipboard

Added an API for working with wheel files

Open agronholm opened this issue 1 year ago • 13 comments

Closes #697.

agronholm avatar May 21 '24 18:05 agronholm

Note to self: remember to take into account older versions of specs (https://github.com/pypa/wheel/issues/440#issuecomment-1010953153) when normalizing/validating wheel names.

agronholm avatar May 22 '24 16:05 agronholm

How would one support prepare_metadata_for_build_wheel using this? I had functionality to only write out the metadata dir. I guess you could write an empty wheel then copy out the .dist-info directory; is that the only way?

henryiii avatar Jul 15 '24 20:07 henryiii

How would one support prepare_metadata_for_build_wheel using this? I had functionality to only write out the metadata dir. I guess you could write an empty wheel then copy out the .dist-info directory; is that the only way?

Separating the metadata preparation step should cover that use case. No need to jump through unnecessary hoops then.

agronholm avatar Jul 21 '24 10:07 agronholm

How would one support prepare_metadata_for_build_wheel using this? I had functionality to only write out the metadata dir. I guess you could write an empty wheel then copy out the .dist-info directory; is that the only way?

Separating the metadata preparation step should cover that use case. No need to jump through unnecessary hoops then.

What changes would that actually entail? From what I can see, only write_metadata() would need to be separated into a free function that would take the package base name and the metadata version as its arguments, yes? Anything else?

agronholm avatar Jul 21 '24 10:07 agronholm

Hello there 👋 I'm the person who had the bright idea to build this jankyness back in the day, and now has to maintain it for the sake of all internal APIs we've built with it in the company I work for :).

It would be great to have one method for writing both files and directories. This way, if there's a list of paths to include (e.g. from a configuration file), the code doesn't have to perform the if os.path.isdir(path): dance. It also adds nice symmetry for people who would use this in conjunction with tarfile.TarFile.add to create a PEP-517 builder.

Out of curiosity: what's the reason behind the split between WheelReader and WheelWriter? I.e. why not just have a single class with different modes of operation? I realize that doing it this way makes the code far simpler, but is there any other reason, perhaps related to IO considerations?

MrMino avatar Aug 05 '24 20:08 MrMino

It would be great to have one method for writing both files and directories. This way, if there's a list of paths to include (e.g. from a configuration file), the code doesn't have to perform the if os.path.isdir(path): dance.

Can you elaborate on this? Only write_files_from_directory() uses Path.is_dir(), but I don't see how that's a problem.

It also adds nice symmetry for people who would use this in conjunction with tarfile.TarFile.add to create a PEP-517 builder.

I frankly don't understand what tar files have to do with wheels.

agronholm avatar Aug 07 '24 14:08 agronholm

Can you elaborate on this? Only write_files_from_directory() uses Path.is_dir(), but I don't see how that's a problem.

It is the API user that will have to Path.is_dir() each path, and that's my point.

I would like to do a WheelWriter.write(path) without having to worry whether its a directory or a file. Right now I have to perform a check first, only then I'll know whether I can use write_files_from_directory or just write_file. Or am I missing something here?

Since you are already performing that check in write_files_from_directory, why not make a write method that can take in both files and directories?

I frankly don't understand what tar files have to do with wheels.

PEP-517 defines two hooks: one for bdists, and one for sdists. I can TarFile.add a path to an sdist, without having to check whether it's a dir or not, yet I cannot WheelWriter.write it to a wheel. It's a minor thing, but if someone wants a builder for both sdists and bdists, it makes the code cleaner when whatever paths are chosen to be added can be treated similarly by both hooks.

MrMino avatar Aug 07 '24 18:08 MrMino

I would like to do a WheelWriter.write(path) without having to worry whether its a directory or a file

Can you elaborate on your use case? Why can you not use write_files_from_directory()?

agronholm avatar Aug 12 '24 09:08 agronholm

Why can you not use write_files_from_directory()?

Because it will throw a WheelError if the user input results in a path to a file being given to it, e.g. when adding files from a list of paths. I need to check each path and switch to write_file if that's the case, which is a pattern that the API should cover for me. I'm questioning the ergonomics of having to choose between those two methods at runtime. The API should be liberal in what it accepts without me having to add the is_dir() boilerplate.

I'm unsure what there is to elaborate further. I feel like I'm wasting everyone's time with a relatively minor point.

The code is already making the necessary check in write_files_from_directory, why not just write_file the path there instead of raising the WheelError(f"{basedir} is not a directory")? All that is left afterwards is renaming it to something more generic, e.g. write.

MrMino avatar Aug 12 '24 10:08 MrMino

I'm unsure what there is to elaborate further. I feel like I'm wasting everyone's time with a relatively minor point.

I've been asking about your use case and you haven't been answering my question, so I will keep asking until you do.

What reason do you have to use the API to write individual files/directories to the wheel? That's what I've been wanting to find out. Are the contents coming from outside the file system?

agronholm avatar Aug 12 '24 19:08 agronholm

:) I thought this was good enough: writing to a wheel based on a list of paths. We are talking about generic mechanic, so I gave a generic use case.

This should come down to API design principles, not to whether I'm capable of coming up with examples on the level of specificity you expect, but let me try:

A dead-simple build backend that creates wheels by packaging only the paths given by the user, as they are specified inside a list in one of pyproject.toml tables: ["file.py", "some_src_dir", "some_other_dir/data.bin"].

Speaking personally, I don't want to have to care whether each path is a directory/symlink/file. That's all there is to this. Really. I don't need 2 (or, god forbid, 3) methods for this, and I don't want to have to choose between them based on my own path checks - I'd like the API to take care of the checking and choosing what's right for each path.

What reason do you have to use the API to write individual files/directories to the wheel? That's what I've been wanting to find out. Are the contents coming from outside the file system?

It's akin to asking "what reason do you have to use the TarFile to write individual files/directories to a tarball". Lots! :wink:. Maybe we are talking past each other - I still want the directories to be walked through and recursively added to the wheel. I just don't want to have to check and choose different method for each path. It's just one if path.is_dir(): ... else ... clause less for the users of this API. :smile:

There's no esoteric use case here. It's just a trivial point about the ergonomics of this API. I don't want to bikeshed this any further.

MrMino avatar Aug 12 '24 21:08 MrMino

My point is that if you're unable to come up with any practical use case for making this change, then why should I do it? It's hardly economical for write() to check if the argument is a file or directory, given the presence of write_files_from_directory(), as ithe former involves an extra system call. For more esoteric use cases, it can also accept a bytestring or unicode string, or even an open file object.

agronholm avatar Aug 12 '24 21:08 agronholm

My point is that if you're unable to come up with any practical use case for making this change, then why should I do it?

I gave you a practical use case. I already use this to help QA folks package their Robot Framework sources into wheels. They do not necessarily have a good understanding of how Python build systems work, so having an explicit list of paths helps a lot.

It's hardly economical for write() to check if the argument is a file or directory, given the presence of write_files_from_directory(), as ithe former involves an extra system call.

If the cost of adding one syscall to write_files_from_directory() is too high, it would be far more productive for me and you if you just led with that. I was honestly second guessing whether we are talking about the same thing or whether I'm understanding the objectives of this API.

I don't think it's a good argument. write_files_from_directory() already does that syscall on L520, it just doesn't do anything useful with that check apart from bouncing the API user if they forget to check if the path is actually a directory.

Further, current semantics mean that the code using this API has to perform this check twice: once to figure out which method to use, then, again inside write_files_from_directory(). So it's actually less economical this way.

Why not just make that method do both and have one less sharp edge?

MrMino avatar Aug 12 '24 22:08 MrMino

@agronholm what's draft about this PR? Are you looking for feedback on the API first?

brettcannon avatar Jul 25 '25 22:07 brettcannon

@agronholm what's draft about this PR? Are you looking for feedback on the API first?

Yes, I'd very much like feedback from potential downstream users to ensure that the API is comfortable enough for them to adopt into their own projects.

agronholm avatar Jul 28 '25 20:07 agronholm

Is there any documentation of the new API? I don't really want to read the implementation to review the API, and as far as I can see what docs will exist in the final build are autogenerated, but there are no docstrings. So if I look at https://packaging--805.org.readthedocs.build/en/805/wheelfile.html, it's empty.

pfmoore avatar Jul 28 '25 21:07 pfmoore

Is there any documentation of the new API? I don't really want to read the implementation to review the API, and as far as I can see what docs will exist in the final build are autogenerated, but there are no docstrings. So if I look at https://packaging--805.org.readthedocs.build/en/805/wheelfile.html, it's empty.

Fair enough, I suppose. Then it would be sufficient at this juncture to add some docstrings and then have downstream project reps look at it?

agronholm avatar Jul 28 '25 21:07 agronholm