convert.py: add python logging instead of print()

Open mofosyne opened this issue 3 months ago • 22 comments

Motivation of this PR is to switch from print() to python logging (battery included package for python), this is so that when I add the --get-outfile command to get the calculated conventional gguf filename it doesn't get polluted by erroneous print() messages.

Apr 06 '24 13:04 mofosyne

Good suggestion, however you are using the logging module wrong. You must at the very least create and use a named instance with logging.getLogger(), see the documentation.

Apr 07 '24 10:04 CISC

Good suggestion, however you are using the logging module wrong. You must at the very least create and use a named instance with logging.getLogger(), see the documentation.

The provided usage is not exactly wrong. You're not restricted from using library methods directly.

IMO, for a single python module (which is a standalone script here), it's fine to use default logger configuration and library methods if you don't need complex logging. But I'm not aware of conventions here, so maybe other maintainers can say if it's ok or not.

Apr 07 '24 15:04 deiteris

The provided usage is not exactly wrong. You're not restricted from using library methods directly.

While true it's a good idea not to, it will make things a lot easier in the long run. Just because you can doesn't mean you should. :)

IMO, for a single python module (which is a standalone script here), it's fine to use default logger configuration and library methods if you don't need complex logging. But I'm not aware of conventions here, so maybe other maintainers can say if it's ok or not.

Ultimately it is up to whomever reviews this PR of course.

Apr 07 '24 17:04 CISC

No probs. Made the adjustment. Hopefully it matches what you are thinking of

Apr 08 '24 02:04 mofosyne

Almost. :)

You missed a couple of logging uses though, and you shouldn't use __name__ as the logger name in a stand-alone script (it'll just be __main__), just calling it convert f.ex. will do fine.

Apr 08 '24 10:04 CISC

Almost. :)

You missed a couple of logging uses though, and you shouldn't use __name__ as the logger name in a stand-alone script (it'll just be __main__), just calling it convert f.ex. will do fine.

You mean do_dump_model()? well in that case, its explicit a dump so I suppress all logging (unless --verbose is active) and only print whats on do_dump_model() happy to adjust behavior if that's unexpected.

Apr 08 '24 11:04 mofosyne

You missed a couple of logging uses though, [...] You mean do_dump_model()? well in that case, its explicit a dump so I suppress all logging (unless --verbose is active) and only print whats on do_dump_model() happy to adjust behavior if that's unexpected.

No, I mean from here and onwards.

Apr 08 '24 11:04 CISC

You missed a couple of logging uses though, [...] You mean do_dump_model()? well in that case, its explicit a dump so I suppress all logging (unless --verbose is active) and only print whats on do_dump_model() happy to adjust behavior if that's unexpected.

No, I mean from here and onwards.

Ah dang. Yeah I see what you mean. Fixed thanks

Apr 08 '24 11:04 mofosyne

Ah dang. Yeah I see what you mean. Fixed thanks

Great, but this was probably not intended though. :)

Apr 08 '24 11:04 CISC

Fixed.

Apr 08 '24 12:04 mofosyne

Since logging is being introduced, we could also change sys.stderr.write call in the default_outfile method to logger.error.

I'm also wondering what'd be the process of ensuring that later PRs don't introduce new print statements to the code. Up for reviewers to detect, or add pipeline checks/tests - e.g. add flake8-no-print to the python-lint pipeline and replace print with sys.stdout.write.

Apr 09 '24 09:04 rgryta

Nit: \n is now unnecessary in the logger.error 😅

Apr 09 '24 15:04 rgryta

Since logging is being introduced, we could also change sys.stderr.write call in the default_outfile method to logger.error.

I'm also wondering what'd be the process of ensuring that later PRs don't introduce new print statements to the code. Up for reviewers to detect, or add pipeline checks/tests - e.g. add flake8-no-print to the python-lint pipeline and replace print with sys.stdout.write.

Alright then, I've figured out how to run flake8 directly (after failing to get pre-commit to work) and start using the flake8 no print allowed plugin. It then threw a whole bunch of errors on all the other pythons scripts in this repo. So decided to give correcting it all a stab.

Also updated the github actions and pre-commit config to use this feature so hopefully it now works.

Since this is changing quite a few scripts, I had to spend a bit more time on other .py scripts that had an older approach to using print() especially if ending with end= or not using fstrings for variables substitution. So you may want to pay attention to this when reviewing.

Unfortunately this has now make this PR bit more daunting to review compared to the past, but with the inclusion of the no-print check this should hopefully be a one time pain for everyone here. :)

Regarding my issue getting pre-commit working

This is what I got. I've looked around but haven't found a solution yet. Hopefully its not an issue for others working on this.

$ pre-commit install
pre-commit installed at /home/mofosyne/huggingface/TinyLLama-v0-llamafile/.git/modules/llama.cpp/hooks/pre-commit

$ pre-commit
[INFO] Initializing environment for https://github.com/PyCQA/flake8:flake8-no-print.
[INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
An unexpected error has occurred: AssertionError: BUG: expected environment for python to be healthy immediately after install, please open an issue describing your environment

more info:

virtualenv python version did not match created version:
- actual version: <<error retrieving version from /home/mofosyne/.cache/pre-commit/reposmwb8fu3/py_env-python3.10/bin/python>>
- expected version: 3.10.12.final.0

Check the log at /home/mofosyne/.cache/pre-commit/pre-commit.log

Apr 10 '24 05:04 mofosyne

rebased on top of latest master due to extra breaking changes on convert-hf-to-gguf.py (more print() was added in the main repo)

Apr 18 '24 04:04 mofosyne

Great work. :)

A final note however, you should use __name__ as logger name for everything in gguf-py/gguf as they will be imported and not run directly.

Apr 18 '24 05:04 CISC

Great work. :)

A final note however, you should use __name__ as logger name for everything in gguf-py/gguf as they will be imported and not run directly.

One of your previous statement appears to state the opposite.

You missed a couple of logging uses though, and you shouldn't use name as the logger name in a stand-alone script (it'll just be main), just calling it convert f.ex. will do fine.

So you are saying just specifically for gguf-py/gguf because it's used within other scripts?

Apr 18 '24 05:04 mofosyne

Anyway added your suggestion in @CISC hopefully it should all check out!

Apr 18 '24 06:04 mofosyne

So you are saying just specifically for gguf-py/gguf because it's used within other scripts?

Yes, because __name__ will contain the module path in this case (and thus all be related to each other), it's mainly stand-alone scripts that benefit from explicitly naming the logger (unless you wish to create a certain logging hierarchy).

Apr 18 '24 07:04 CISC

Rebased to add one additional [convert-hf-to-gguf.py: add additional logging](https://github.com/ggerganov/llama.cpp/pull/6511/commits/a6f54dee3ca65ae1dbeab3f8c26c1d75a9609715) commit to fix addition of print() convert-hf-to-gguf.py in the main branch.

Apr 21 '24 01:04 mofosyne

I have a pattern we can use to centralize the logger instantiation with.

The implementation is really simple and flexible.

LOGGER_FORMAT = "%(asctime)s - %(filename)s:%(lineno)d - %(levelname)s - %(message)s"

def get_default_logger(name=None, level=logging.DEBUG):
    logger = logging.getLogger(name)
    if not logger.hasHandlers():
        handler = logging.StreamHandler(stream=sys.stdout)
        formatter = logging.Formatter(LOGGER_FORMAT)
        handler.setFormatter(formatter)
        logger.addHandler(handler)
        logger.setLevel(level)
    return logger

I left the docstring out of the snippet shown here, but this is basically it. You're free to use it to simplify the implementation and reduce code duplication. I've been able to use this in classes, functions, and scripts and regularly use it. It's under the MIT License, so it's compatible with llama.cpp.

This pattern can be extended over time and included in the gguf-py package as a logger.py module that can be improved over time.

I never liked the use of the global LOGGER_FORMAT, but it simplified the initial implementation which I extracted from a few other programs I wrote. The implementation details can be ironed out over time as needed though.

Apr 21 '24 19:04 teleprint-me

Happy to add if @cebtenzzre thinks it's a good idea.

What's the advantage of a centralized logger instance? Would this be better as a separate PR (Which would be good if it's only a small change to each file).

The main objective of this PR is simply to convert from print() to logger in a preferably simple and extendable way, so at least for this one I would prefer to minimize adding dependencies.

Apr 21 '24 23:04 mofosyne

No worries. It was just a suggestion. No need to use it if undesired.

As for your questions.

This is builtin. No added dependency. You're already using it.

import logging  # python builtin

Adding a default logger instance creates a predicable implementation, applies DRY, and reduces code duplication.

# Any name can be used here.
logger = get_default_logger(name="convert.py", level=logging.INFO)

Unified point of contact makes it so that a single change reflects the desired logging format from single location.

LOGGER_FORMAT

The implementation details would be specific to gguf-py, hence the suggestion for a logger.py module that's customized to this projects specific needs.

You already made most of the changes, this would simply refine it. If you feel that this should be done in a separate PR, then so be it.

Apr 22 '24 02:04 teleprint-me

Hopefully all is good? Noticed more changes to base branch so did a local rebase and flake8 to check for any issues. So far no problems so no force pushing needed.

$ git fetch upstream master
remote: Enumerating objects: 173, done.
remote: Counting objects: 100% (148/148), done.
remote: Compressing objects: 100% (110/110), done.
remote: Total 173 (delta 87), reused 71 (delta 38), pack-reused 25
Receiving objects: 100% (173/173), 815.29 KiB | 931.00 KiB/s, done.
Resolving deltas: 100% (88/88), completed with 7 local objects.
From github.com:ggerganov/llama.cpp
 * branch            master     -> FETCH_HEAD
 * [new branch]      master     -> upstream/master
$ git rebase upstream/master
Successfully rebased and updated refs/heads/convert-py-logging.
$ flake8
$ git reset --hard origin/convert-py-logging 
HEAD is now at 14d39e4 convert-hf-to-gguf.py: print() --> logger
$ git status
On branch convert-py-logging
Your branch is up to date with 'origin/convert-py-logging'.

nothing to commit, working tree clean

Apr 29 '24 02:04 mofosyne

Github noticed some new conflicts so manually rebased and resolved the conflicts.

Also then noticed the addition of this new file convert-hf-to-gguf-update.py which is triggering warnings in flake8 when removed from ignore list. Decided to resolve all flake8 warnings manually so there be some extra changes to convert-hf-to-gguf-update.py.

Thanks for the catch, hopefully this is closer!

Apr 29 '24 16:04 mofosyne

Resolved conflict in two files.

edit: saw that the check said some test was cancelled. Tested on my side by running flake8. Plus the cancelled test seems more about packaging rather than automatic QA

Apr 30 '24 09:04 mofosyne

@mofosyne Great work here, unfortunately I think you were a bit unlucky and fell right between some urgent events, leaving everyone very busy, which has postponed the merge of this PR.

Also I do not have any reviewer powers, just being a busy-body. :) You'll have to wait for @cebtenzzre and/or @ggerganov to get some spare time.

May 01 '24 07:05 CISC

@mofosyne Great work here, unfortunately I think you were a bit unlucky and fell right between some urgent events, leaving everyone very busy, which has postponed the merge of this PR.

Also I do not have any reviewer powers, just being a busy-body. :) You'll have to wait for @cebtenzzre and/or @ggerganov to get some spare time.

Ah that's not a problem. Ultimately, this is mostly just a quality of life fix to make developer life smoother. But I do hope to eventually get this in, as better debugging will ultimately make everyone's life easier.

May 02 '24 06:05 mofosyne

Okay implemented most of your suggestions @compilade but excluded some style observations as it doesn't appear to be an issue.

edit: Accodentally typed cebtenzzre. Intended to invoke compilade

May 03 '24 02:05 mofosyne

Maybe this one should be prioritized for now?

May 03 '24 05:05 teleprint-me

Thanks for merging! Hopefully all is well for other PRs!

May 04 '24 04:05 mofosyne

llama.cpp llama.cpp copied to clipboard

convert.py: add python logging instead of print()

Regarding my issue getting pre-commit working

llama.cpp
llama.cpp copied to clipboard