Clean up the purification layer
It should be a different PR (and even a different Issue). I would start with adding unit tests to make sure things are sane, and then improve it.
Touching the unit test framework is dangerous business and we need to do it carefully.
But that part is like ~10 years old (no jokes) and quite messy. I appreciate the courage to jump into that
Originally posted by @gpsaggese in https://github.com/causify-ai/helpers/pull/734#discussion_r2105966418
@gpsaggese
There are already tests for purify_from_environment, purify_amp_references, except for purify_app_references. I will create tests for purify_app_references and then start cleaning up the code. I might even change tests for other functions to meet the current demands.
Currently, we have overlapping logic across purify_from_environment, purify_app_references, and purify_amp_references, especially for path substitutions.
This becomes problematic when os.getcwd(), the Git root (via hgit.get_client_root()), CSFY_HOST_GIT_ROOT_PATH, and /app all point to the same directory, usually /app when we run tests inside the container.
Proposal: Introduce a single function with a strict priority for replacing paths; the order is as follows:
-
$GIT_ROOT -
CSFY_HOST_GIT_ROOT_PATH -
$PWD -
If the above three are not pointing to
/app, then replace/appwith$APP_DIR
By keeping the logic in one function instead of spreading it across all other functions, we could easily understand the logic and debug if there's any issue.
The /amp pattern is not removed from the text sometimes, even though we specify purify_expected_txt = True, because the purify function only identifies amp when it appears at the start of the line. I plan to modify it so that it can detect and remove amp even when it is located in the middle of the line. This created issue #758.
We could also combine purify_app_references and purify_amp_references; it's just a small update like adding a[mp]p in the regex pattern in one of the functions.
Good analysis.
- I don't like the current amp / app mix up, it should be separated (at least in terms of code)
- I am ok with having a precise order of applying the transforms
- Each replacement should be done as an atom (e.g., the dir
appshould be recognized and replaced wherever it is, as long as it's not inside other strings, e.g.,application) - We should document these design principles in a short doc (and we'll add it to the doc for https://github.com/causify-ai/helpers/issues/779)
Let's do small changes and plan the PRs properly, since for every change we need to run unit tests in all the repos. So small carefully planned PRs
Merged to master https://github.com/causify-ai/helpers/pull/786
Done