cachier icon indicating copy to clipboard operation
cachier copied to clipboard

Support for source code check

Open GuillaumeDesforges opened this issue 5 years ago • 4 comments

Hi, I have an idea for a feature that would really be helpful, especially in a data science experimentation workflow.

Add an boolean argument to the wrapper function, for instance inspect_source. When it is set to True, use the inspect module to look at the source code of the function and use its hash the same way you do for the function arguments.

Would help a lot !

GuillaumeDesforges avatar May 13 '19 08:05 GuillaumeDesforges

Hi @GuillaumeDesforges !

That's a great idea! I would love helping you write and add this to the package, if you want to see this feature come to life. :)

shaypal5 avatar May 30 '19 14:05 shaypal5

Thanks @shaypal5 for the enthusiastic reply :)

The feature is a bit tricky to implement. It must be well thought to prevent very dangerous situations.

For instance, announcing that the source code is checked for changes before running the caching operations means that the user will expect modifications to cascade. Say you have

from cachier import cachier


def add_some(x):
  return x + 1

@cachier(inspect_source=True)
def some_heavy_operation(x):
  x = add_some(x)
  return x

def run():
  result = some_heavy_operation(1)
  print(result)

run() # prints 2

When changing the value from 1 to 2 in add_some, recomputation is necessary.

However, we also don't want to systematically check the source code of functions called, especially if they are from a package and do not change, because that would cause a huge overhead.

My guess would be that rather than a boolean parameter inspect_source, it could be preferable to set it to a list of functions and classes to inspect, so that the users can define himself the behaviour.

The more I think about it, the more it feels like a bad idea™ ...

I would be glad to hear your thoughts !

GuillaumeDesforges avatar Jun 03 '19 13:06 GuillaumeDesforges

However, we also don't want to systematically check the source code of functions called, especially if they are from a package and do not change, because that would cause a huge overhead.

In the general case, it would be impossible to follow the chain of functions called and verify that they are the same. This is the Turing problem, you can't test what a program will do without actually running the program.

I would be curious what the exact use case is that you are describing, for instance what inspired you in the first place?

NickCrews avatar Apr 24 '20 08:04 NickCrews

However, we also don't want to systematically check the source code of functions called, especially if they are from a package and do not change, because that would cause a huge overhead.

In the general case, it would be impossible to follow the chain of functions called and verify that they are the same. This is the Turing problem, you can't test what a program will do without actually running the program.

I would be curious what the exact use case is that you are describing, for instance what inspired you in the first place?

Yes, a fully working mechanism wouldn't be possible, but an approximation would be by tracking source code files where possible if ever possibly called, more like what linters do. Would not be perfect as it would not differentiate function by what they do but how they are written (which is completely different), but would cover most use cases.

The use case is simple. In data science experimentations it is not rare to build brick by brick your experiment, and storing intermediate results helps faster testing the next brick you are building directly on top of the previous functions (instead of writing to some file and loading it manually).

Some tools like DVC provide means to do that, but in a very heavy way in my opinion.

I'm not doing things like that anymore and won't have time to work on it unfortunately.

GuillaumeDesforges avatar Apr 24 '20 08:04 GuillaumeDesforges