maggma icon indicating copy to clipboard operation
maggma copied to clipboard

Removing the requirement to use a `task_id`

Open Andrew-S-Rosen opened this issue 2 years ago • 5 comments

As I discussed with @munrojm, there are instances in maggma where the user has to supply a "task_id" where it is arguably not necessary.

Consider this example:

from maggma.stores import MontyStore

store = MontyStore("my_db")
d = {"test": "hi"}
with store:
    store.update(d)

You get back the following traceback:

--------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[27], line 3
      1 d = {"test": "hi"}
      2 with store:
----> 3     store.update(d)

File [c:\users\asros\github\maggma\src\maggma\stores\mongolike.py:1040](file:///C:/users/asros/github/maggma/src/maggma/stores/mongolike.py:1040), in MontyStore.update(self, docs, key)
   1038     search_doc = {k: d[k] for k in key}
   1039 else:
-> 1040     search_doc = {key: d[key]}
   1042 self._collection.replace_one(search_doc, d, upsert=True)

KeyError: 'task_id'

If the user supplies, say, d = {"task_id": 1, "test": "hi"} it works and you get:

[{'_id': ObjectId('64c2c5e507a03e712c848446'), 'task_id': 1, 'test': 'hi'}]

But this is awkward. If it's already going to assign an _id to the entry, there should not be a need to have the user pass a task_id. If no task_id is specified, it should fallback to using the automatically generated _id.

Andrew-S-Rosen avatar Jul 27 '23 21:07 Andrew-S-Rosen

Good point @arosen93 . Maybe we could implement this in Store.__init__ such that if you set key=None then the _id field is used by default?

This would require us to manually create ObjectID whenever not using an actual MongoDB - backed store.

rkingsbury avatar Jul 31 '23 15:07 rkingsbury

That seems like a logical approach to me!

Andrew-S-Rosen avatar Jul 31 '23 15:07 Andrew-S-Rosen

Motivation for not defaulting to _id: not all Stores have or use _id.

from maggma.stores import JSONStore

with JSONStore("test.json", read_only=False) as store:
    store.update({"test": "hi"})

doesn't work and

from maggma.stores import JSONStore

with JSONStore("test.json", read_only=False) as store:
    store.update({"task_id": "hello", "test": "hi"})

does work and yields a file that contains

[{"task_id": "hello", "test": "hi"}]

with no _id.

Andrew-S-Rosen avatar Mar 04 '24 02:03 Andrew-S-Rosen

Motivation for not defaulting to _id: not all Stores have or use _id.

I'm not sure I understand your point in this example. Is the recommended behavior still to default to an automatically generated _id field if the user passes task_id=None? Obviously we would need to implement this in the base Store class so that it would always be assigned if needed.

rkingsbury avatar Mar 04 '24 11:03 rkingsbury

Apologies for not being clear. I was knee deep in debugging and being too brief.

Yes, I would recommend having an automatically generated _id field if the user passed task_id=None. In the original example with MontyStore, and _id was automatically generated already. For the JSONStore, there is no such _id field. But I would support this automatic implementation in Store.

Andrew-S-Rosen avatar Mar 04 '24 16:03 Andrew-S-Rosen