pymc-marketing
pymc-marketing copied to clipboard
Add Codespaces config file?
Now that we have a Dockerfile for development, I want to get everyone's thoughts on adding a .devcontainer/devcontainer.json
file for GitHub Codespaces. This would enable anyone signed into GitHub to spin up a dev environment in two mouse clicks:
Every GitHub user has a free monthly Codespaces compute quota. The free container compute instances themselves are rather small (the largest is 4 cores, 16GB Ram, 32GB storage), but I've been using Codespaces as my primary Python IDE in my full-time job for the past year and have only hit the monthly limit once.
The .devcontainer/devcontainer.json
file may look something like this:
{
"image":"jupyter/base-notebook:python-3.11.6",
"postCreateCommand": "make init",
"customizations": {
"vscode": {
"extensions": [
"ms-python.python",
"ms-toolsai.jupyter",
"davidanson.vscode-markdownlint"
]
}
}
}
Metadata Reference: https://containers.dev/implementors/json_reference/
@maresb I'm particularly interested to hear your thoughts.
I havent used much. Seems cool
People could check out the notebooks then, right? Maybe we can link if someone wants to try them out
Or is this mainly for code development?
It can be used for both. A 2-core compute is fine for unit testing script changes, and is how I've kept my team's Python dev costs near zero for the past year.
For pymc-marketing
dev work I still prefer PyCharm on my M2 Pro, but I have occasionally spun up a Codespace for a quick test or prototype of an idea.
Hey, sorry for the delay on responding to this, and thanks for kicking off this conversation!
In principle I think this is a nice idea, but in practice it's really easy to underestimate the maintenance burden and then it becomes either yet another thing we have to maintain or it's broken and wastes the time of anyone who tries it out. Basically I'm saying that it's really easy to underestimate how difficult it is to get something like this running well enough to be helpful, so if we want to do this we should put some thought into how to do it right, and then commit to maintaining it.
Codespaces is very similar to Gitpod. I honestly don't have that much experience with Codespaces, but I imagine that it's better integrated. I've spent a bit of time trying to get Gitpod running with PyMC. I'm not particularly satisfied with the results, and also I didn't invest as much time as I would have liked. Documentation for PyMC with Gitpod is here and here.
We should probably put some thought into the base image. Like does it even make sense to run jupyter/base-notebook
here? Wouldn't it be more natural to use the VS Code Jupyter extension? We'd probably want to customize it so that pymc-marketing is already pip installed with editable mode and pre-commit hooks are configured.
For the base image we might want something similar to this coupled with a workflow like this. The analogue of devcontainer.json
is .gitpod.yml. Also note that a lot of heavy lifting behind the scenes is being done here.
It's been some months since I've taken a close look at how I'm doing things, so it could very well be that the above details can be vastly simplified. (Obligatory mention of pixi.)
Thanks for responding @maresb!
Like does it even make sense to run jupyter/base-notebook here? Wouldn't it be more natural to use the VS Code Jupyter extension? We'd probably want to customize it so that pymc-marketing is already pip installed with editable mode and pre-commit hooks are configured.
Probably not - I only put that one in there because it's the base image for our Dockerfile, but isn't that image based on micromamba
?
The VS Code Jupyter extension is also specified in the above example, but the pre-commit hooks still need to be added and I've never had much luck getting the hooks to play well with the VS Code Git UI.
If a workflow is required to build a base image, then I agree this is not worth the effort. I'll look into what can be done from devcontainer.json
alone and let you know what I find.
Probably not - I only put that one in there because it's the base image for our Dockerfile, but isn't that image based on
micromamba
?
No, micromamba is used to bootstrap the initail Conda environment and then deleted.
The VS Code Jupyter extension is also specified in the above example, but the pre-commit hooks still need to be added and I've never had much luck getting the hooks to play well with the VS Code Git UI.
Ya, this has always been a pain point for me as well. Whenever a pre-commit hook fails, the error message looks like something else until you open the logs. Also, I've had a really difficult time trying to pre-cache the pre-commit hook venvs. It seems like it should be easy, but I've had nothing but problems trying to get it to run smoothly (I forget exactly why, but it has to do with some combination of dynamic user creation, directories being hardcoded within venvs, and the Git extension failing to pick up the envvar for an alternate pre-commit home location). For the PyMC Gitpod I ended up just creating the venvs in the background on initialization.
If a workflow is required to build a base image, then I agree this is not worth the effort.
I'm not so scared of this. It's basically already done for PyMC and should be easily transferable here. Also, I think I've solved a lot of the problems here. It mainly needs a commitment from someone to kick the tires every once in a while and periodically reevaluate if things can be redone more simply.
@maresb after giving this some thought, I've decided a Codespaces setup isn't worth our time. The new streamlit app renders this less pertinent, and the Codespaces free tier also runs atop of unallocated compute, so if someone in the data center trains an LLM, Codespaces will get stuck on startup and we'll have to field users issues on why. Jupyter Notebook support in VSCode isn't amazing either, and I've frequently crashed kernels with pm.sample()
.