cppimport icon indicating copy to clipboard operation
cppimport copied to clipboard

Dependency version check

Open joshlk opened this issue 3 years ago • 8 comments

Hey,

I have a source file that has a dependency and I want to force cppimport to recompile the source when the dependency version changes. Unfortunately the header files of the dependency can remain the same when the version changes so I can't rely on that.

I have a Python function that obtains the version of the dependency, what's the best way to force cppimport too recompile given this information? Would it be possible to add functionality so you can pass a string to be included in the checksum calculation (cppimport.checksum._calc_cur_checksum)?

joshlk avatar Aug 01 '22 12:08 joshlk

Would it be feasible to do something like this:

version_str = '1.0.1'
with open('version', 'w') as f:
    f.write(version_str)
cfg['dependencies'].append('version')

The 'dependencies' configuration variable doesn't actually do anything with the contents so the paths provided can be any sort of file you want.

tbenthompson avatar Aug 01 '22 15:08 tbenthompson

Yer that could work. That's not easily doable at the moment right?

joshlk avatar Aug 01 '22 17:08 joshlk

The snippet that I wrote above would work with no changes. In practice, I'd probably write out the version file as part of the process that updates dependencies. Or, do something like pip freeze > deps (substitute in the equivalent list of package versions for your setup). Then, the cfg['dependencies'].append(version_or_deps_filename) would be either in the preamble of the .cpp file or in the code that imports the file.

Does that make sense? Am I missing a piece?

tbenthompson avatar Aug 01 '22 18:08 tbenthompson

Ow do you mean to add this to the template comment in the source file?

But would that trigger a rebuild as the template is only read if it's determined that a rebuild is needed?

joshlk avatar Aug 02 '22 10:08 joshlk

Here's a more explicit version of what I'm imagining. Two files:

// cppimport - my_module.cpp

// you c++ code here ...

/*
<%
cfg['dependencies'].append('dependency_versions.lock')
%>
*/
# python_file.py
import cppimport
cppimport.import_hook
import my_module
# your python code here ...

This will result in a rebuild any time the dependency_versions.lock file is updated. The list of dependencies is stored in an appendix to the shared library.

Then, elsewhere, whenever you want to update the dependency versions, you write to that file. How exactly you write to it will depend on your exact application, but one application might be something like:

// update_dependencies.py
pip install -U package_names
pip freeze > dependency_versions.lock

Is this useful?

tbenthompson avatar Aug 02 '22 16:08 tbenthompson

Thanks - the above works but it would be better if the dependency file was generated on the fly. This code doesn't seem to work:

import hashlib
from pathlib import Path
import os
from cppimport import build_filepath

cpp_code = """// cppimport

#include <pybind11/pybind11.h>

int add(int i, int j) {
    return i + j;
}

PYBIND11_MODULE(source, m) {
    m.def("add", &add);
}

/*
<%
import random
print('Running template')
with open('versionsX', 'w') as f:
    f.write(str(random.random()))
cfg['dependencies'].append('versionsX')
setup_pybind11(cfg)
%>
*/
"""

def md5_file_hash(path: str) -> str:
    # File data + creation time + modified time
    data = Path(path).read_bytes() \
            + bytes(str(os.path.getctime(path)), 'utf8') \
            + bytes(str(os.path.getmtime(path)), 'utf8')
    return hashlib.md5(data).hexdigest()

def test_load_lib_dep_change():
    with open('source.cpp', 'w') as f:
        f.write(cpp_code)
    
    # Compile first time
    print('First compile')
    binary_path = build_filepath('source.cpp')
    assert os.path.exists(binary_path)
    binary_hash = md5_file_hash(binary_path)

    # Compile again
    print('Second compile')
    binary_path = build_filepath('source.cpp')
    assert os.path.exists(binary_path)
    assert binary_hash != md5_file_hash(binary_path)

Here the dependency file is generated in the template. From looking at the cppimport code I don't understand why the template isn't run a second time.

joshlk avatar Aug 04 '22 05:08 joshlk

Yes, it's as intended that the template is not being run a second time here. You can see the check for whether to enter the build step here: https://github.com/tbenthompson/cppimport/blob/e19e9a601bd1deeae983ef02aa3aa78171472d3f/cppimport/init.py#L86

Currently, if the dependencies haven't changed and the extension seems built and importable then the expected behavior is to import the extension without re-running the template.

I'd be open to changing the behavior so that we always run the templates on import. Or at least providing an option to run the template. One thing I'd be curious about is the performance impact of running the template during every import. Are we adding 500us or 50ms to the import time in the average case?

Another related issue here is that the cppimport configuration is currently unpleasantly global. If we're adding another configuration option, it might be time to either make the configuration something that can be passed to the imp(...) function or add a built-in configuration context manager so that configuration variables can easily be set just for the duration of a single import.

tbenthompson avatar Aug 05 '22 22:08 tbenthompson

Ok I tested it here:

scratch.py:

import os
from cppimport.importer import is_build_needed, setup_module_data
from cppimport.templating import run_templating

cpp_code = """// cppimport

#include <pybind11/pybind11.h>

int add(int i, int j) {
    return i + j;
}

PYBIND11_MODULE(source, m) {
    m.def("add", &add);
}

/*
<%
import random
with open('versionsX', 'w') as f:
    f.write(str(0))
cfg['dependencies'].append('versionsX')
setup_pybind11(cfg)
%>
*/
"""

with open('source.cpp', 'w') as f:
    f.write(cpp_code)

filepath = os.path.abspath('source.cpp')
fullname = os.path.splitext(os.path.basename(filepath))[0]
module_data = setup_module_data(fullname, filepath)
is_build_needed(module_data)
# run_templating(module_data)   # << uncomment for comparison
# No templating
$ python3 -m timeit -n 1000 "exec(open('scratch.py').read())"
1000 loops, best of 5: 494 usec per loop

# With templating
$ python3 -m timeit -n 1000 "exec(open('scratch.py').read())"
1000 loops, best of 5: 2.04 msec per loop

So its around 10 times slower.


I think for our purpose it should be ok to write to the dependency file before importing the module. I will explore that option.

Thanks for all the help!

joshlk avatar Aug 09 '22 11:08 joshlk