llama.cpp
llama.cpp copied to clipboard
convert.py: add python logging instead of print()
Motivation of this PR is to switch from print() to python logging (battery included package for python), this is so that when I add the --get-outfile
command to get the calculated conventional gguf filename it doesn't get polluted by erroneous print() messages.
Good suggestion, however you are using the logging module wrong. You must at the very least create and use a named instance with logging.getLogger(), see the documentation.
Good suggestion, however you are using the logging module wrong. You must at the very least create and use a named instance with logging.getLogger(), see the documentation.
The provided usage is not exactly wrong. You're not restricted from using library methods directly.
IMO, for a single python module (which is a standalone script here), it's fine to use default logger configuration and library methods if you don't need complex logging. But I'm not aware of conventions here, so maybe other maintainers can say if it's ok or not.
The provided usage is not exactly wrong. You're not restricted from using library methods directly.
While true it's a good idea not to, it will make things a lot easier in the long run. Just because you can doesn't mean you should. :)
IMO, for a single python module (which is a standalone script here), it's fine to use default logger configuration and library methods if you don't need complex logging. But I'm not aware of conventions here, so maybe other maintainers can say if it's ok or not.
Ultimately it is up to whomever reviews this PR of course.
No probs. Made the adjustment. Hopefully it matches what you are thinking of
Almost. :)
You missed a couple of logging
uses though, and you shouldn't use __name__
as the logger name in a stand-alone script (it'll just be __main__
), just calling it convert
f.ex. will do fine.
Almost. :)
You missed a couple of
logging
uses though, and you shouldn't use__name__
as the logger name in a stand-alone script (it'll just be__main__
), just calling itconvert
f.ex. will do fine.
You mean do_dump_model()? well in that case, its explicit a dump so I suppress all logging (unless --verbose is active) and only print whats on do_dump_model() happy to adjust behavior if that's unexpected.
You missed a couple of
logging
uses though, [...] You mean do_dump_model()? well in that case, its explicit a dump so I suppress all logging (unless --verbose is active) and only print whats on do_dump_model() happy to adjust behavior if that's unexpected.
No, I mean from here and onwards.
You missed a couple of
logging
uses though, [...] You mean do_dump_model()? well in that case, its explicit a dump so I suppress all logging (unless --verbose is active) and only print whats on do_dump_model() happy to adjust behavior if that's unexpected.No, I mean from here and onwards.
Ah dang. Yeah I see what you mean. Fixed thanks
Ah dang. Yeah I see what you mean. Fixed thanks
Great, but this was probably not intended though. :)
Fixed.
Since logging is being introduced, we could also change sys.stderr.write
call in the default_outfile
method to logger.error
.
I'm also wondering what'd be the process of ensuring that later PRs don't introduce new print statements to the code. Up for reviewers to detect, or add pipeline checks/tests - e.g. add flake8-no-print
to the python-lint pipeline and replace print
with sys.stdout.write
.
Nit: \n
is now unnecessary in the logger.error 😅
Since logging is being introduced, we could also change
sys.stderr.write
call in thedefault_outfile
method tologger.error
.I'm also wondering what'd be the process of ensuring that later PRs don't introduce new print statements to the code. Up for reviewers to detect, or add pipeline checks/tests - e.g. add
flake8-no-print
to the python-lint pipeline and replacesys.stdout.write
.
Alright then, I've figured out how to run flake8 directly (after failing to get pre-commit to work) and start using the flake8 no print allowed plugin. It then threw a whole bunch of errors on all the other pythons scripts in this repo. So decided to give correcting it all a stab.
Also updated the github actions and pre-commit config to use this feature so hopefully it now works.
Since this is changing quite a few scripts, I had to spend a bit more time on other .py scripts that had an older approach to using print() especially if ending with end=
or not using fstrings for variables substitution. So you may want to pay attention to this when reviewing.
Unfortunately this has now make this PR bit more daunting to review compared to the past, but with the inclusion of the no-print check this should hopefully be a one time pain for everyone here. :)
Regarding my issue getting pre-commit working
This is what I got. I've looked around but haven't found a solution yet. Hopefully its not an issue for others working on this.
$ pre-commit install
pre-commit installed at /home/mofosyne/huggingface/TinyLLama-v0-llamafile/.git/modules/llama.cpp/hooks/pre-commit
$ pre-commit
[INFO] Initializing environment for https://github.com/PyCQA/flake8:flake8-no-print.
[INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
An unexpected error has occurred: AssertionError: BUG: expected environment for python to be healthy immediately after install, please open an issue describing your environment
more info:
virtualenv python version did not match created version:
- actual version: <<error retrieving version from /home/mofosyne/.cache/pre-commit/reposmwb8fu3/py_env-python3.10/bin/python>>
- expected version: 3.10.12.final.0
Check the log at /home/mofosyne/.cache/pre-commit/pre-commit.log
rebased on top of latest master due to extra breaking changes on convert-hf-to-gguf.py (more print() was added in the main repo)
Great work. :)
A final note however, you should use __name__
as logger name for everything in gguf-py/gguf as they will be imported and not run directly.
Great work. :)
A final note however, you should use
__name__
as logger name for everything in gguf-py/gguf as they will be imported and not run directly.
One of your previous statement appears to state the opposite.
You missed a couple of logging uses though, and you shouldn't use name as the logger name in a stand-alone script (it'll just be main), just calling it convert f.ex. will do fine.
So you are saying just specifically for gguf-py/gguf
because it's used within other scripts?
Anyway added your suggestion in @CISC hopefully it should all check out!
So you are saying just specifically for
gguf-py/gguf
because it's used within other scripts?
Yes, because __name__
will contain the module path in this case (and thus all be related to each other), it's mainly stand-alone scripts that benefit from explicitly naming the logger (unless you wish to create a certain logging hierarchy).
Rebased to add one additional [convert-hf-to-gguf.py: add additional logging](https://github.com/ggerganov/llama.cpp/pull/6511/commits/a6f54dee3ca65ae1dbeab3f8c26c1d75a9609715)
commit to fix addition of print() convert-hf-to-gguf.py
in the main branch.
I have a pattern we can use to centralize the logger instantiation with.
The implementation is really simple and flexible.
LOGGER_FORMAT = "%(asctime)s - %(filename)s:%(lineno)d - %(levelname)s - %(message)s"
def get_default_logger(name=None, level=logging.DEBUG):
logger = logging.getLogger(name)
if not logger.hasHandlers():
handler = logging.StreamHandler(stream=sys.stdout)
formatter = logging.Formatter(LOGGER_FORMAT)
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(level)
return logger
I left the docstring out of the snippet shown here, but this is basically it. You're free to use it to simplify the implementation and reduce code duplication. I've been able to use this in classes, functions, and scripts and regularly use it. It's under the MIT License, so it's compatible with llama.cpp
.
This pattern can be extended over time and included in the gguf-py
package as a logger.py
module that can be improved over time.
I never liked the use of the global LOGGER_FORMAT
, but it simplified the initial implementation which I extracted from a few other programs I wrote. The implementation details can be ironed out over time as needed though.
Happy to add if @cebtenzzre thinks it's a good idea.
What's the advantage of a centralized logger instance? Would this be better as a separate PR (Which would be good if it's only a small change to each file).
The main objective of this PR is simply to convert from print() to logger in a preferably simple and extendable way, so at least for this one I would prefer to minimize adding dependencies.
No worries. It was just a suggestion. No need to use it if undesired.
As for your questions.
This is builtin. No added dependency. You're already using it.
import logging # python builtin
Adding a default logger instance creates a predicable implementation, applies DRY, and reduces code duplication.
# Any name can be used here.
logger = get_default_logger(name="convert.py", level=logging.INFO)
Unified point of contact makes it so that a single change reflects the desired logging format from single location.
LOGGER_FORMAT
The implementation details would be specific to gguf-py
, hence the suggestion for a logger.py
module that's customized to this projects specific needs.
You already made most of the changes, this would simply refine it. If you feel that this should be done in a separate PR, then so be it.
Hopefully all is good? Noticed more changes to base branch so did a local rebase and flake8 to check for any issues. So far no problems so no force pushing needed.
$ git fetch upstream master
remote: Enumerating objects: 173, done.
remote: Counting objects: 100% (148/148), done.
remote: Compressing objects: 100% (110/110), done.
remote: Total 173 (delta 87), reused 71 (delta 38), pack-reused 25
Receiving objects: 100% (173/173), 815.29 KiB | 931.00 KiB/s, done.
Resolving deltas: 100% (88/88), completed with 7 local objects.
From github.com:ggerganov/llama.cpp
* branch master -> FETCH_HEAD
* [new branch] master -> upstream/master
$ git rebase upstream/master
Successfully rebased and updated refs/heads/convert-py-logging.
$ flake8
$ git reset --hard origin/convert-py-logging
HEAD is now at 14d39e4 convert-hf-to-gguf.py: print() --> logger
$ git status
On branch convert-py-logging
Your branch is up to date with 'origin/convert-py-logging'.
nothing to commit, working tree clean
Github noticed some new conflicts so manually rebased and resolved the conflicts.
Also then noticed the addition of this new file convert-hf-to-gguf-update.py
which is triggering warnings in flake8 when removed from ignore list. Decided to resolve all flake8 warnings manually so there be some extra changes to convert-hf-to-gguf-update.py
.
Thanks for the catch, hopefully this is closer!
Resolved conflict in two files.
edit: saw that the check said some test was cancelled. Tested on my side by running flake8. Plus the cancelled test seems more about packaging rather than automatic QA
@mofosyne Great work here, unfortunately I think you were a bit unlucky and fell right between some urgent events, leaving everyone very busy, which has postponed the merge of this PR.
Also I do not have any reviewer powers, just being a busy-body. :) You'll have to wait for @cebtenzzre and/or @ggerganov to get some spare time.
@mofosyne Great work here, unfortunately I think you were a bit unlucky and fell right between some urgent events, leaving everyone very busy, which has postponed the merge of this PR.
Also I do not have any reviewer powers, just being a busy-body. :) You'll have to wait for @cebtenzzre and/or @ggerganov to get some spare time.
Ah that's not a problem. Ultimately, this is mostly just a quality of life fix to make developer life smoother. But I do hope to eventually get this in, as better debugging will ultimately make everyone's life easier.
Okay implemented most of your suggestions @compilade but excluded some style observations as it doesn't appear to be an issue.
edit: Accodentally typed cebtenzzre. Intended to invoke compilade
Maybe this one should be prioritized for now?
Thanks for merging! Hopefully all is well for other PRs!