Refactor logging functionality and add features
Proposed new feature or change
This issue is a first dump of my thoughts. Feel free to comment on it and improve it.
Motive
Logging functionality is currently implemented with the print function and the logging package. In addition, some of these are commented out for no apparent reason.
The issue is that users want the logging to be done by one and only one way. In addition, the users want to inspect the logs in a log file.
I therefore propose to implement a coherent logging system. In this process, the following should be done.
- uncomment all commented out
printandloggingcode, - move all the logging to the
loggingpackage, - add a central logging object for each module, and
- add that the logs are written to log files,
- remove all
updatestrings in the messages,
Example
Actual
As stated above, the logic is spread out throughout the code base. Here are some examples for each of the five points above.
print used for logging.
https://github.com/llmware-ai/llmware/blob/e3435db316bc9c2b955bd8c50c80a55f526da2a0/llmware/embeddings.py#L448
An commented out print statement. https://github.com/llmware-ai/llmware/blob/e3435db316bc9c2b955bd8c50c80a55f526da2a0/llmware/embeddings.py#L294
A logging statement with update in it.
https://github.com/llmware-ai/llmware/blob/e3435db316bc9c2b955bd8c50c80a55f526da2a0/llmware/embeddings.py#L114-L115
Desired
It is desirable to clean up the logging functionality and add logging to a file. The output of such a log message, should, for example,
2024-02-11T16:04:47.556391+00:00:embedding:INFO: Milvus - Embeddings Created: 10 of 50
or
2024-02-11T16:04:47.556391+00:00:embedding:WARNING: unable to determine if embeddings have been properly counted and captured. Please check if databases connected.
Generally, the form should hence be
<DATETIME>:<MODULE_NAME>:<LOGGING_LEVEL>:<LOGGING_MESSAGE>
Existing solutions I looked for
Within the package, none, since it is currently not implemented at all.
Outside of the package, yes. A blueprint might be to take a look at how flask does it.
Proposed solution
In __init__.py add the following.
import os
import logging
from llmware.configs import LLMWareConfig
def create_logger():
logging.basicConfig(level=LLMWareConfig.get_logging_level())
logger = logging.getLogger(__name__)
# maybe some config logic here, for example to set where the logs should be written to,
# depending on the operating system. In addtion, setting the format of the log messages - see above.
....
return logger
And then add in each module
from __init___ import create_logger
logger = create_logger()
Technical Analysis
The proposed efforts, and hence the changes, would clean the codebase (e.g. moving print class to the logger), streamline the logging functionality (e.g. adding the create_logger function to configure the logger centrally), and add new features for permanent logging to a file, or files.
Additional links
@MacOS - agree 100% with your assessment - and would appreciate your help and support to drive this logging improvement. As you note, it will require updates across the code-base.
Great!
I think it is a good idea to chop this up in smaller junks. May I suggest the following steps as a starting point for discussions, in that order.
- A PR that deals with the commented out statements. Either by deleting them or adding them back.
- A PR that unifies the
LOGGING_MESSAGE. - A PR that moves all the logging to the logging package.
- A PR that adds a central logging object for each module, and
- A PR that adds that the logs can optionally be are written to log files,
Besides, it is really important that we agree on the logging format. The format I suggested has the obvious problem with the :, as it separates the parts of the logging message and the parts of the DATETIME.