Add ability to wait for an environment
- Add a
wait.pycontaining utilities that allow for waiting for specific files to appear, and for loading imports. These are defined by the environment variablesCOG_WAIT_FILEandCOG_EAGER_IMPORTS(CSV). If neither of these environment variables exist this is a no-op. - Separate out
BasePredictorandBaseInputinto their own class files now that they are accessed by multiple consumers (config and predictor). - Create a
Configclass which is an abstraction aroundcog.yamlallowing for environment variables to be used as a substitute for certain attributes of thecog.yaml, in theory allowing us to load the cog environment without relying on this file provided we have set the right environment variables. - Change
create_appto take aConfigclass rather that a dictionary. - Add an
env_propertyfunction decorator for reading from an environment variable before dropping into its wrapped function. This prevents the need for loadingcog.yamlif the property is guarded by this. - Add a
Modeenumeration that delineates between the different modes ofpredictandtrainin a type safe way. - Wait for the environment to setup right before the worker process calls
setup. - Change the test stubs to use the
Configclass while injecting in some mock values if applicable. - Add some tests for
code_xformto make sense of its inputs and outputs. - Increase test timeout to 20s, this is due to the wait tests using up the previous time allotment of 10s.
This allows cog to begin running and do as much work as possible with as little information as possible while it waits for a file system to appear around it. Currently cog requires cog.yaml and the prediction/training python to appear before setting up the HTTP client, in this case we can setup the HTTP client while we load these files in the background and get a signal that the file loading is finished and continue loading the predictors setup, in addition to this we perform any pre-emptive work we can while we are waiting for these files to load (such as eagerly importing modules).
This is a pretty large diff and while the PR description does a good job of telling what the changes are, it doesn't help me understand the motivation for those changes.
What problems is this changeset setting out to solve?
Added a blurb at the bottom of the PR description, let me know if it helps.
I'd be lying if I said I'd given this PR a full and thorough review. What I can say is that in the last months, changing the Python parts of cog without breaking people has been challenging at best, and this is honestly way too big a changeset to attempt to land all at once.
I very much like some of the changes you've made here: making things that should be enums enums, wrapping up environment config into a
Configobject, etc.Can we perhaps take a step back, extract these things one at a time, and start a series of significantly smaller PRs to achieve these goals?
I can give it a go. This branch/PR will have to stay up due to systems currently depending on it, so the way I will attempt this is by ripping concepts out of this PR then once they are in main merging this branch back against main until it contains a more succinct version of itself.
This has now been slimmed down to just the wait code, so is ready to be reviewed in its own right.