Reorganize Python Modules
What is it?
This is not a fully fleshed out issue but in general this is a set of problems that we need to address at some point.
Lots of the development of things in the oso repo happend this way:
- Prototype
- Ship to prod
- Move to the next thing
While this has been generally great for us in terms of Getting Shit Done™️, it has introduced some unwanted technical debt. It also makes things a bit awful when trying to figure out where to put new code.
Here are some of the current issues with this:
- Currently the files that make up the
osocli are all over the place and it doesn't make sense where that CLI lives.- Bonus: Should this also just live under
pnpm? I originally avoided this because external data scientists using the library would have to install unneeded node things to use the python part of the library.
- Bonus: Should this also just live under
- We have
oso_dagster,metrics_tools, andopsscriptsas high level modules now. However, many of these share things between each other. This is not great - Traditionally, python tests are stored in separate directories from the modules they're testing, I tried to do the golang thing of putting them in the same place. I'm curious if others have thoughts on this.
Some initial thoughts:
- We use a lot of sqlglot related tools in many places. We should put those in one place
- The "metrics" tools might just be extensions of that sqlglot tooling
-
oso_dagstermight be fine where it is, but submodules inside it that are not dagster specific should likely be put somewhere more common.
After a discussion with @IcaroG and @Jabolol this is the proposed folder structure for what we'd like to see:
lib/
oso_core/ # Much of metrics_tools should go here
# except the metrics calculation service
# Also this should have any other common lib/tools
pyoso/
cli/ # the oso_lets_go cli and ops
# related cli interfaces should go here
warehouse/
oso_dagster/ # We should move general things (
# not dagster specific) to oso_core
oso_sqlmesh/
metrics_service/
Wanted to just get this done but realize it will break some things in sqlmesh so for now we will slowly migrate things.
@ravenac95 is this still relevant?
@ryscheng some of it is still relevant. It hasn't been completely closed. Let me do some work to enumerate the remaining work.
One other thing that @ccerv1 just brought up is that the seed data is in a confusing place. We should also reorganize this.
So remaining things to do here:
- We should turn all python packages into their own python projects (e.g. things should have their own
pyproject.tomlfiles)- [ ]
metrics_service - [ ]
oso_sqlmesh - [ ]
oso_dagster
- [ ]
- We should ensure that the correct references exist between projects
- Some additional work to maybe move things into
oso-core