QCFractal icon indicating copy to clipboard operation
QCFractal copied to clipboard

Can't serialize to JSON

Open mattwthompson opened this issue 5 years ago • 5 comments

Describe the bug I wanted to save a collection to disk in order to avoid needing to downloads a large dataset every time I ran a test or re-started a notebook.

To Reproduce

import json

import qcportal


client = ptl.FractalClient()
ds = client.get_collection('OptimizationDataset', 'OpenFF Optimization Set 1')
ds.to_json(filename='data.json')

raises TypeError: Object of type set is not JSON serializable

Expected behavior I expected to be able to save this out to JSON

Additional context There is data in the collection object:

image

@loriab suggested on Slack that a set may have snuck in somewhere. This is probably a terrible collection to debug on since it includes something like 20,000 records.

mattwthompson avatar May 04 '20 19:05 mattwthompson

Ok I see the problem. The history key is a set

bennybp avatar May 06 '20 13:05 bennybp

Diving into this one now.

dotsdl avatar Nov 03 '20 19:11 dotsdl

@bennybp I see you made a commit that addressed history being a set in a branch on your fork. Are you planning to merge this? Otherwise I can make the change in a PR here.

dotsdl avatar Nov 03 '20 19:11 dotsdl

I started to fix it in that PR but abandoned it. It is a little more involved than just changing it to a list (there are some places that use the set functionality that have to also be modified.

Looking at the database, the database stores this info as JSON. Not entirely sure where this gets converted from a set to a list on the backend...

bennybp avatar Nov 03 '20 19:11 bennybp

Ah cool, thank you for that clarification. Working on a solution that doesn't fail in other places.

dotsdl avatar Nov 03 '20 20:11 dotsdl

I ran into this again today. For my provenance, the quickest solution is to just pop the history. There's probably a way to map it onto a list but I don't think I need it for my use use case and just data['history']= list(data['history']) did not completely work - it was happy to write to disk but could not be read back. I didn't look further into why.

import json

import qcportal

client = qcportal.FractalClient(verify=False)

dataset = client.get_collection(
    "OptimizationDataset",
    "OpenFF Iodine Chemistry Optimization Dataset v1.0",
)

with open("dataset.json", "w") as file:

    data = dataset.to_json()
    data.pop("history")

    json.dump(data, file)

with open("dataset.json", "r") as file:
    data = json.load(file)

mattwthompson avatar Nov 04 '22 16:11 mattwthompson

Superseded by #740 for v0.50

bennybp avatar Sep 14 '23 18:09 bennybp