BayesianOptimization icon indicating copy to clipboard operation
BayesianOptimization copied to clipboard

log files are not valid json format

Open rruiter87 opened this issue 3 years ago • 3 comments

It is a small issue, but annoying non the less. When I try to commit a json log file, it gets flagged by the json5 pre-commit hook.

check json5..............................................................Failed
- hook id: check-json5
- exit code: 1

log.json: Failed to json decode (<string>:2 Unexpected "{" at column 1)

Possible solutions:

  1. Read of the original file, update it and write the whole updated json
  2. Do not use the .json extension as stated the documentation, but for example txt instead.
  3. Hacky way would be to write a [ on first opening of the file. And on subsequent writes start with a ,. After the last write do a ].

rruiter87 avatar Jun 10 '22 13:06 rruiter87

I came across this issue as well, when I need to read the history file and do some plots or analysis. I think a simple solution for the meantime is using something like this: https://pynative.com/python-parse-multiple-json-objects-from-file/

def load_history_line_by_line(directory, file):
    data = []
    filename = f'{os.path.join(directory,file)}'
    with open(filename, "r+") as j:
        for line in j:
            data.append(json.loads(line))
    return data

I think the main issue here is that JSONLogger just json.dumps the individual objects into the history file.

class JSONLogger(_Tracker):
    def __init__(self, path):
        self._path = path if path[-5:] == ".json" else path + ".json"
        try:
            os.remove(self._path)
        except OSError:
            pass
        super(JSONLogger, self).__init__()

    def update(self, event, instance):
        if event == Events.OPTIMIZATION_STEP:
            data = dict(instance.res[-1])

            now, time_elapsed, time_delta = self._time_metrics()
            data["datetime"] = {
                "datetime": now,
                "elapsed": time_elapsed,
                "delta": time_delta,
            }

            with open(self._path, "a") as f:
                f.write(json.dumps(data) + "\n")

        self._update_tracker(event, instance)

To fix this, the code would have to be rewritten to first read the history.json file, and add a new entry. Apparently, you need to load a json file before you append to it (https://stackoverflow.com/questions/45869039/appending-data-to-json-file-with-python).

rmcconke avatar Jun 13 '22 17:06 rmcconke

Hmm I see. So basically each new entry would be a valid json file but the aggregate of these is not. That is annoying (but it does explain why a while ago I couldn't get json highlighting to work properly!). Of the possible solutions suggested:

  1. I think this can work, it would have to be in a 'finally' statement, because it is not uncommon for the optimizer to terminate prematurely.
  2. I'd be fine with this personally and it's certainly the least amount of work.
  3. I don't think I like this one, just seems like there's a lot of ways for it to go wrong.

A fourth option:

  1. Store the entire history in memory (it might actually already be stored?), and at each update json dump the entire history instead of the current entry. I think this will ensure a valid json output at all times will still enabling 'real time' logging. I haven't looked to see how much work it would be though.

All things considered my vote would be for 2 followed by 4.

bwheelz36 avatar Jun 14 '22 07:06 bwheelz36

Option 2 sounds good. Maybe .log would be a good extension?

rruiter87 avatar Jun 14 '22 13:06 rruiter87