executing icon indicating copy to clipboard operation
executing copied to clipboard

Index node finder

Open 15r10nk opened this issue 3 years ago • 11 comments

This is still work in progress.

The goal is an NodeFinder with similar capabilities like PositionNodeFinder, but for cpython <= 3.10.

see #42

todo:

  • [x] workaround for https://github.com/python/cpython/issues/100537
  • [x] integrate node verification ( create Verifier which can be used for python 3.11 and 3.10)
  • [x] optimize
  • expand to:
    • [x] 3.9
    • [x] 3.8
    • [ ] ~~3.7 (if possible)~~ to difficult for now
    • [ ] ~~3.6 (if possible)~~ to difficult for now
  • [x] try graph matching approach
  • [ ] review my own code and try to understand what I have done :smile:

15r10nk avatar Dec 18 '22 21:12 15r10nk

hi @alexmojaki. I have some good news for you. The new NodeFinder is near completion and works so far, but I need some more time to finish and to optimize it.

Here you can get some impression what it can do:

https://github.com/15r10nk/executing/blob/index_node_finder/Features.md

This list gets generated by pytest and probably still misses some features.

15r10nk avatar Feb 19 '23 20:02 15r10nk

We're going to use executing in https://github.com/pydantic/logfire (btw you can see inline-snapshot being used there). But there are cases where logfire rewrites the AST and this prevents executing from working in Python<3.11. If this branch works better in the presence of pytest then I expect it to work better in our case as well.

It sounded like this was going well, can it be revived?

alexmojaki avatar May 08 '24 12:05 alexmojaki

cool that inline-snapshot is used in logfire :smile:

If this branch works better in the presence of pytest then I expect it to work better in our case as well.

kind of. This branch works not out of the box with pytest. I had to reapply the pytest changes on top of the original ast to make the mapping work. We would have to do the same with logfire / any other library which modifies the ast (like lazy-imports-lite for example).

It sounded like this was going well, can it be revived?

Yes, the concept worked, but I would merge it with the current main and add some comments to make it easier to understand before you review it.

I have also a general question to ast-rewriting with a custom MetaPathFinder. I see the concept in several libraries (pytest/logfire/lazy-import-lite/ and probably many others). Is there any concept which would allow to combine multiple ast transformations? Like a library which installs one MetaPathFinder and calls multiple "plugins" which transform the ast before it gets converted into bytecode?

The auto_trace feature of logfire can currently not be used in combination with assertion rewriting of pytest for example. And I'm also not sure which rewriting would be used.

15r10nk avatar May 08 '24 15:05 15r10nk

kind of. This branch works not out of the box with pytest. I had to reapply the pytest changes on top of the original ast to make the mapping work.

Wouldn't that work equally well with the other node finders? It sounds like this branch doesn't actually help with pytest.

Would this branch work better when the statement being executed hasn't been modified, i.e. all the modifications are 'around' the executed statement?

I have also a general question to ast-rewriting with a custom MetaPathFinder. I see the concept in several libraries (pytest/logfire/lazy-import-lite/ and probably many others). Is there any concept which would allow to combine multiple ast transformations? Like a library which installs one MetaPathFinder and calls multiple "plugins" which transform the ast before it gets converted into bytecode?

I have also thought such a thing would be nice. I imagine it doesn't exist because AST rewriting isn't common enough that people are bothered by them not cooperating.

Another library that does this is birdseye.

I could make executing work with auto-tracing if I did the AST transformation on the tree of an executing.Source. The problem is that @logfire.instrument also uses AST rewriting which depends on arguments, meaning that there is no longer a single Source associated with a file.

alexmojaki avatar May 08 '24 16:05 alexmojaki

Wouldn't that work equally well with the other node finders? It sounds like this branch doesn't actually help with pytest.

This branch has almost the same functionality like the PositionNodeFinder. But I had to introduce a dependency to pytest to make it work.

https://github.com/15r10nk/executing/blob/6e2d947ae80960f5efc432bef358bf7b70526157/executing/_index_node_finder.py#L500

We would have to add something similar for logfire.

The problem is that @logfire.instrument also uses AST rewriting which depends on arguments, meaning that there is no longer a single Source associated with a file.

I hope that I'm correct here, but this should not be a problem. logfire could memorize every ast it has created for a source file and we could check each of them for a matching bytecode.

15r10nk avatar May 08 '24 19:05 15r10nk

Would this branch work better when the statement being executed hasn't been modified, i.e. all the modifications are 'around' the executed statement?

A modification at root level messes with the bytecode structure and the algo can not map the bytecodes (and child bytecodes) any more. It could handle changes inside functions and be able to map expressions outside the changed functions, but pytest adds imports at the root level.

15r10nk avatar May 08 '24 20:05 15r10nk

Ah, after rereading https://github.com/alexmojaki/executing/pull/31#issuecomment-1210897904 I see that this method requires the bytecode to match the source code exactly, my mistake for thinking otherwise.

I could store a Source for each modified code object produced by instrument, but at that point it's getting messy. I was hoping that there was a better solution here.

alexmojaki avatar May 08 '24 20:05 alexmojaki

I could store a Source for each modified code object produced by instrument, but at that point it's getting messy. I was hoping that there was a better solution here.

It might be enough to store the args. You should be able to generate the Source with this args and the original ast.

I was thinking about something like this:


@executing.register_ast_provider
def get_logfire_ast(path,original_ast):
   for args in all_args_for_file(path):
       yield transform(original_ast,args)

Could this also be used to make the SentinelNodeFinder work?

15r10nk avatar May 08 '24 20:05 15r10nk

Can you tell me why you need the args to generate the code? Are there cases where you have to generate different code for different args?

I read some source of logfire but I did not understood why instrument should generate different code for different args.

15r10nk avatar May 08 '24 21:05 15r10nk

Are there cases where you have to generate different code for different args?

https://github.com/pydantic/logfire/blob/8ce1cfe31208e5dc87f35221ef0d7eb2401348e4/logfire/_internal/instrument.py#L129

It might be possible to always generate the same code every time, but probably with some small performance cost.

I can consider various approaches to solving this problem if I get to it. For now I'm just going to only fully support this feature in 3.11+.

alexmojaki avatar May 08 '24 21:05 alexmojaki

It is just an idea and I probably don't know enough about logfire, but you could generate something like this:


@logfire.instrument
def some_function(arg_a,arg_b,__logfire_args__):
    if __logfire_args__.extract_args:
        __logfire_args__.method({"arg_a":arg_a,"arg_b":arg_b})
    ...

The instrument decorator should pass the __logfire_args__ at runtime.

This would also save some time if you have multiple instrumented functions, because you would transform the source only once.

But you have probably similar ideas.

It is ok for me if this branch does not get merged. The new algo is nice but quiet complex. We can keep the current implementation if we don't need more features.

15r10nk avatar May 08 '24 22:05 15r10nk