librascal icon indicating copy to clipboard operation
librascal copied to clipboard

Rascal json encoder

Open agoscinski opened this issue 3 years ago • 4 comments

Because of

ASE changed it's JSON structure format in ~3.16. the format we use is so simplistic that we did not write a dedicated function to do it. The ASE reader on the other hand is backward compatible

one cannot simply use ase to transform the files to json I used the ase json encoder as basis to make a rascal encoder https://gitlab.com/ase/ase/blob/master/ase/io/jsonio.py#L25-27

I could put this a bit more cleaned to the python utils. Please give thumbs up if you think this is good or comment if you think we should solve the problem in a different way

import ase.io
import numpy as np
import json

json_frames = {}
frames = ase.io.read('/home/alexgo/datasets/methane.extxyz', ':2')
for frame in frames:
    frame.cell = [50, 50, 50]
    frame.center()

class RascalEncoder(json.JSONEncoder):
    def default(self, obj):
        if hasattr(obj, 'todict'):
            d = obj.todict()

            if not isinstance(d, dict):
                raise RuntimeError('todict() of {} returned object of type {} '
                                   'but should have returned dict'
                                   .format(obj, type(d)))
            if hasattr(obj, 'ase_objtype'):
                d['__ase_objtype__'] = obj.ase_objtype

            return d
        if isinstance(obj, np.ndarray):
            return obj.tolist()
        if isinstance(obj, np.integer):
            return int(obj)
        if isinstance(obj, np.bool_):
            return bool(obj)
        if isinstance(obj, datetime.datetime):
            return {'__datetime__': obj.isoformat()}
        if isinstance(obj, complex):
            return {'__complex__': (obj.real, obj.imag)}
        return json.JSONEncoder.default(self, obj)

for i, frame in enumerate(frames):
    json_frames[str(i)] = json.loads(json.dumps(frame, cls=RascalEncoder))

json_frames['ids'] = [i for i in range(len(frames))]
json_frames['nextid'] = len(frames)
with open('/home/alexgo/datasets/methane_test.json', 'w') as f:
    json.dump(json_frames, f, indent=2)

agoscinski avatar Jun 15 '21 13:06 agoscinski

While we are doing this, I think there's a more transparent way to encode nparrays, if I'm not mistaken. I find this tolist() a bit burdensome

On Tue, 15 Jun 2021 at 15:21, agoscinski @.***> wrote:

Because of

ASE changed it's JSON structure format in ~3.16. the format we use is so simplistic that we did not write a dedicated function to do it. The ASE reader on the other hand is backward compatible

one cannot simply use ase to transform the files to json I used the ase json encoder as basis to make a rascal encoder https://gitlab.com/ase/ase/blob/master/ase/io/jsonio.py#L25-27

I could put this a bit more cleaned to the python utils. Please give thumbs up if you think this is good or comment if you think we should solve the problem an a different way

import ase.ioimport numpy as npimport json json_frames = {}frames = ase.io.read('/home/alexgo/datasets/methane.extxyz', ':2')for frame in frames: frame.cell = [50, 50, 50] frame.center() class RascalEncoder(json.JSONEncoder): def default(self, obj): if hasattr(obj, 'todict'): d = obj.todict()

        if not isinstance(d, dict):
            raise RuntimeError('todict() of {} returned object of type {} '
                               'but should have returned dict'
                               .format(obj, type(d)))
        if hasattr(obj, 'ase_objtype'):
            d['__ase_objtype__'] = obj.ase_objtype

        return d
    if isinstance(obj, np.ndarray):
        return obj.tolist()
    if isinstance(obj, np.integer):
        return int(obj)
    if isinstance(obj, np.bool_):
        return bool(obj)
    if isinstance(obj, datetime.datetime):
        return {'__datetime__': obj.isoformat()}
    if isinstance(obj, complex):
        return {'__complex__': (obj.real, obj.imag)}
    return json.JSONEncoder.default(self, obj)

for i, frame in enumerate(frames): json_frames[str(i)] = json.loads(json.dumps(frame, cls=RascalEncoder)) with open('/home/alexgo/datasets/methane_test.json', 'w') as f: json.dump(json_frames, f, indent=2)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cosmo-epfl/librascal/issues/363, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIREZYHSEBV3NQUIHQZKI3TS5HWPANCNFSM46XIL5MQ .

ceriottm avatar Jun 15 '21 14:06 ceriottm

While we are doing this, I think there's a more transparent way to encode nparrays, if I'm not mistaken. I find this tolist() a bit burdensome

I am not sure what issues appear with tolist() ?

agoscinski avatar Jun 17 '21 17:06 agoscinski

boh, it's an additional conversion that people need to do. I was wondering if we could use a custom encoder, similar to what they do here, second answer https://stackoverflow.com/questions/26646362/numpy-array-is-not-json-serializable

On Thu, 17 Jun 2021 at 19:53, agoscinski @.***> wrote:

While we are doing this, I think there's a more transparent way to encode nparrays, if I'm not mistaken. I find this tolist() a bit burdensome

I am not sure what issues appear with tolist() ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cosmo-epfl/librascal/issues/363#issuecomment-863441628, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIREZ2C2HSHJSKTXAPUMR3TTIZBBANCNFSM46XIL5MQ .

ceriottm avatar Jun 18 '21 06:06 ceriottm

The first 4 answers all use tolist()

        if isinstance(obj, np.ndarray):
            return obj.tolist()

agoscinski avatar Jun 18 '21 20:06 agoscinski