core
core copied to clipboard
Refactor network module - prepare it for easier sub-module testing
Please feel free to suggest more changes that would lead to easier testing or improve the readability of the code. The Scrutinizer analysis already reports some good improvements so far.
Implemented refactoring changes:
- Makefile extension to support ocrd network test commands:
network-module-test,network-integration-test - Harmonize code - spacing between method parameters to maintain readability while still reducing the lines of the files. More variable typing wherever it was missing. Extensive use of
"instead of'for strings to remove the need of escaping the'symbol when the string includes'. - Improved error messages to include extra variable data are available for easier debugging
- Replaced repetitive constants with extensive use of Enums such as:
AgentType,DeployType,JobState,NetworkLoggingDirs,ServerApiTags - Env variable refactor to be generally used for all RabbitMQ clients and not just workers
OCRD_NETWORK_WORKER_QUEUE_CONNECT_ATTEMPTS->OCRD_NETWORK_RABBITMQ_CLIENT_CONNECT_ATTEMPTS - The default job state is now set to
UNSETwhen not provided StateEnum->JobStatefor better clarification- All occurrences of
status->stateto maintain the same variable name everywhere - The logging module to avoid potential name collisions:
src/ocrd_network/logging.py->src/ocrd_network/logging_utils.py - Combined all RabbitMQ-related data into a single dictionary variable instead of 5 separate variables (processing server and processing worker). The validators also return a dictionary of validated variables.
- Deployer module - increased readability and testability of separate deployments.
- Run-time data modules - created separate files for different concepts. Check here.
- Provided config parser wrapper methods for the different run-time data (for easier testing). Check here.
- Processing server improvements (~20% less lines):
- Introduced
raise_http_exceptionmethod to wrap repetitive lines - Created wrapper methods for separate concepts in big methods for easier concept testing, e.g.,
parse_workflow_tasks,validate_first_task_input_file_groups_existenceand moved these toutilsorserver_utilsdepending on whether a server response is involved or not. Moreover, moved some other full methods which were better fitting insideutilsorserver_utilssuch ascreate_processing_message - Overall improved naming conventions for methods, e.g.,
push_processor_job->validate_and_forward_job_to_network_agentwhere network agent is of kindAgentType(check Enums above) - Refactored server cache implementation + provided
syncversions of theasyncmethods to directly reuse these in the tests.
- Introduced
- Refactored rabbitmq_utils:
- Moved the
create_queuemethod from processing worker torabbitmq_utilsmodule - Provided extra 4 wrapper methods to remove code duplication in the processing server and processing worker. Check here.
- Moved the
- Extended the OcrdProcessingMessage schema to support
agent_type. Check here. - Fixed the OcrdResultMessage schema - the schema required either
path_to_metsorworkspace_id. However, settingworkspace_idtoNonewas triggering an error that it's not of typestringaccording to the schema. On the other hand, assigning an empty string toworkspace_idwas triggering an error that eitherpath_to_metsorworkspace_idcould be set but not both. So there was no way to create a result message without a crash. By default, the fieldspath_to_metsorworkspace_idare set to empty strings when nothing is assigned. Check here.
Implemented tests:
- The naming conventions are
test_integration_*ortest_modules_*- the integration tests require access to either the MongoDB or RabbitMQ, but the module tests do not. Hence, the module tests can be executed without starting any extra services (faster for testing). - Refactor
test_db.py->test_integration_1_db.pyand replaced magic values with constants - Refactor
test_rabbitmq.py->test_integration_2_rabbitmq.pyand replaced magic values with constants test_integration_3_server_cache_requests.py- tests of the server cache for processing requeststest_integration_4_processing_worker.py- tests of the processing workertest_integration_5_processing_server.py- tests of the processing servertest_modules_logging_utils.pytest_modules_param_validators.pytest_modules_process_helpers.py- Testing invocation of pythonic and bashlib processortest_modules_server_cache_pages.py- Unlike the server cache requests, the pages cache does not require database access.
Open issues:
- The bashlib processor test is still not working and is disabled currently -
_test_invoke_processor_bash()- Shall be fixed in this PR. - There is still the
deploy_agent_native_get_pid_hackmethod, but now isolated away from the deployer. An eye should be kept on that in another PR.
Note: I have not merged the master branch to this PR yet, since v2.63.3 broke some mets caching. I will potentially do that before this PR can be finally merged.
Note2: The macOS tests are disabled since they fail due to a reason not related to this PR itself.
Arh, I hate Github Actions' false positives such as fb38c56 and 588b017 tests! There are literally failing processing endpoint tests.
For me this PR is hard to review, because this are really a lot of changes.
Understandable.
I have tried to look into all the files but especially the changes in the processing_server where hard to follow.
For the processing server specifically, it is better to check the result not the changes. That one file got a lot of modifications and had to become fewer lines (~20% less).
I run the tests though and I tried to run the changes in a self build Dockercontainer together with a pythonic and bashlib worker. This tests all passed on my machine.
Have you also tried the _test_invoke_processor_bash() test by removing the underscore? That is the one I am not sure how to run yet to test the spawning of a bashlib processor separately from a worker.
The good thing is this changes nearly only affect the network package
Yes, tried to avoid moving stuff inside other modules (shall be the last thing to do) for now since this PR is already big enough.
Have you also tried the
_test_invoke_processor_bash()test by removing the underscore? That is the one I am not sure how to run yet to test the spawning of a bashlib processor separately from a worker.
No, I didn't try that.