setuptools icon indicating copy to clipboard operation
setuptools copied to clipboard

Implemented right file name in the stack frame of `setup.py` being `exec`uted.

Open KOLANICH opened this issue 3 years ago • 8 comments

Summary of changes

This PR provides a better traceback when examining stack while building wheels, i.e. from setuptools plugins. Instead of <string> it now prints the actual filename of setup.py, and if the setup.py is virtual, it prints where its source can be found in the form of an inline string.

Now some opinion: I think it may make sense to eliminate exec of the virtual setup.py file and call the setup function directly.

Pull Request Checklist

  • [ ] Changes have tests
  • [x] News fragment added in [changelog.d/]. (See [documentation][PR docs] for details)

KOLANICH avatar Sep 02 '22 09:09 KOLANICH

Hi @KOLANICH, in https://github.com/pypa/setuptools/pull/3395 I was addressing the performance issue that is usually attributed to setuptools. I did some experiments that seems to indicate doing a compile was extremely expensive (of course there might be some problems in the design of the experiment).

Is there any chance you can do a similar performance analysis?

I also can see in Python docs a warning about compile:

Warning: It is possible to crash the Python interpreter with a sufficiently large/complex string when compiling to an AST object due to stack depth limitations in Python’s AST compiler.

Which is a bit off-putting... The same warning is not present in the exec function (although this might only be because someone forgot to include it).

abravalheri avatar Sep 02 '22 10:09 abravalheri

thanks for the info. I guess the rest of discussion about that should be done in that PR.

KOLANICH avatar Sep 02 '22 10:09 KOLANICH

I guess the rest of discussion about that should be done in that PR.

Since that other PR was already discussed and this one wants to revert some aspects, I believe there is merit in proceeding with the discussion here.

Specifically, it would be nice to have some performance analysis to see if reverting to compile would still represent a hit on the execution time.

I think this is important to analyse, since the community seems to have a strong criticism about setuptools speed. For example, the total time was one of the main arguments used to disqualify setuptools as a candidate to be the default backend showcased in the PyPA tutorial.

Now some opinion: I think it may make sense to eliminate exec of the virtual setup.py file and call the setup function directly.

I completely agree with that. This has been on my radar for a while and I even gave it a try in the past. There are some weird bootstrapping issues, but that can be worked around. Eventually I plan to get back to it, but please feel free to try it out if you have the time.

abravalheri avatar Sep 02 '22 10:09 abravalheri

since the community seems to have a strong criticism about setuptools speed.

For me when there is is any slowdown in packages startup (and setuptools is really damn slow, and so is everything using entry_points, including my own packages) it is usually in pkg_resources. I guess it is because it walks the FS. I guess the proper solution is eliminating walking of the FS and caching the metadata into an index file, let it be a SQLite database. Of course we'll have to store the metadata separately too, so the index file can be re-recreated if anything goes wrong.

KOLANICH avatar Sep 02 '22 11:09 KOLANICH

And here I duplicate some insight into CPython source code:

compile calls Py_CompileStringObject

https://github.com/python/cpython/blob/bbcf42449e13c0b62f145cd49d12674ef3d5bf64/Python/bltinmodule.c#L738-L840

exec calls PyRun_StringFlags for the case of strings and PyEval_EvalCodeEx for the case of code.

https://github.com/python/cpython/blob/bbcf42449e13c0b62f145cd49d12674ef3d5bf64/Python/bltinmodule.c#L996-L1111

PyRunStringFlags calls _PyParser_ASTFromString to parse AST and then run_mod, which through the chain of funcs calls PyEval_EvalCode, which looks much simpler than PyEval_EvalCodeEx.

https://github.com/python/cpython/blob/8a0d9a6bb77a72cd8b9ece01b7c1163fff28029a/Python/pythonrun.c#L1587-L1607

run_mod calls _PyAST_Compile (with the optimization level -1, which is the default for compile too, which means retrieving it from _Py_GetConfig()->optimization_level) and then run_eval_code_obj

So the optimization flags should be the same.

https://github.com/python/cpython/blob/8a0d9a6bb77a72cd8b9ece01b7c1163fff28029a/Python/pythonrun.c#L1720-L1737

and Py_CompileStringObject calls _PyParserASTFromString and then _PyAST_Compile.

KOLANICH avatar Sep 02 '22 11:09 KOLANICH

For me when there is is any slowdown in packages startup (and setuptools is really damn slow, and so is everything using entry_points, including my own packages) it is usually in pkg_resources.

Setuptool's latest version should be using importlib for entry-points which should be faster. Unfortunatelly pkg_resources is still imported elsewhere, but slowly we will be able to remove its calls.

Thank you very much for the detailed investigation @KOLANICH. It sounds very reasonable. It would be interesting to have some experimental evidence as well to match this theoretical analysis (similar to the one in #3395). Any chance we can reuse/repurpose part of this toy example to compare the before/after the PR: https://gist.github.com/abravalheri/d561ca67405a9b3b1d182811a0eaf398#file-readme-md?

I want to do run this (eventually), but please feel free to give it a try yourself. If you find something wrong with the script or have a better suggestion on how to do it, please let me know.

abravalheri avatar Sep 02 '22 15:09 abravalheri

Setuptool's latest version should be using importlib for entry-points which should be faster. Unfortunatelly pkg_resources is still imported elsewhere, but slowly we will be able to remove its calls.

Thank you for the info, it was my setuptools plugin that has used pkg_resources. As I have implemented using importlib.metadata, it has become a bit faster. But 2BH importlib.metadata.entry_points() still gives a noticeable delay.

I wonder if we should

  1. upgrade pkg_resources to use the new mechanism;
  2. spin-up a new metadata system, made of 3 components: 1 a database with metadata in an SQLite table (not all the metadata is within the table, only the one we cannot look-up from the FS fast by key), 2 is API to upgrade metadata cache for an installed package available from within importlib.metadata, and the third one is to make pipuse the API.

KOLANICH avatar Sep 02 '22 16:09 KOLANICH

The new table for entry points system can also store entry points metadata and lookup matching entry points using SQLite select queries.

KOLANICH avatar Sep 02 '22 16:09 KOLANICH