appengine-fixture-loader icon indicating copy to clipboard operation
appengine-fixture-loader copied to clipboard

Proposal for supporting ancestor keys in fixtures

Open john2x opened this issue 10 years ago • 13 comments

If ancestor keys are desired, then the whole ancestor path will need to be defined explicitly in the fixture data.

Custom keys are specified by the '__key__' special attribute with its value being an array of kind and id pairs. The array will need to have an even number of elements (the same specs as defined here https://cloud.google.com/appengine/docs/python/ndb/entities)

e.g.

[{ ...
    "__key__": ["ParentKind", "parent_id", "SomeKind", "some_id"]
 }]

P.S. on line # 59 in loader.py, I moved the post_processor(obj) call to outside of the for loop (I believe this was a bug? As calling the post_processor in the loop would trigger it for each attribute. Let me know if that was intended instead.)

EDIT: After reviewing, it seems __key__ (instead of _key) would be a better attribute name?

john2x avatar Jan 31 '15 07:01 john2x

Thanks a lot, John!

I was imagining in turning the JSON loader inside out so that ancestors would be .put() before their children, allowing ancestor/children to be defined using the same syntax as non-ancestor children.

    {
        "__kind__": "Person",
        "born": "1968-03-03T00:00:00",
        "first_name": "John",
        "last_name": "Doe",

        ...

        "__children__": [
            {
                "__kind__": "Person",
                "born": "1970-04-27T00:00:00",

                ...

                "__children__": [
                    {
                        "__kind__": "Person",
                        "born": "1980-05-25T00:00:00",
                        "first_name": "Bob",

                        ...

                        "userid": 3
                    }
                ]
            }
        ]
    },

Would you mind splitting the postprocessor fix and the README update as separate pull requests so we can discuss ancestors a little more before we go this way or the other? My children key seems to me (obviously) more natural (the JSON itself is a hierarchy) but is a lot more work than your approach and I'd like to mature both ideas before we commit to The One True Way of doing ancestors.

rbanffy avatar Feb 02 '15 01:02 rbanffy

Hi!

I've created the two separate pull requests for the README and post_processor changes.

Hmm one potential issue with the nested way of defining ancestors/children is that two entities can't share the same parent/child, as each will have its own 'copy' of its parent/child.

Also, I just realized while testing this that the ancestor entities don't actually need to be saved first before any of its descendants. Setting an entity's parent to a key with no value is perfectly valid, although querying its parent later on would fail.

EDIT: My proposal would be to something like:

[{"__kind__": "Person",
  "born": "1968-03-03T00:00:00",
  "first_name": "John",
  "last_name": "Doe",
  "children": [["Person", 2], ["Person", 3]]},
 {"__kind__": "Person",
  "__key__": ["Person", 2],
  "born": "1980-03-03T00:00:00",
  "first_name": "Junior",
  "last_name": "Doe",
  "children": []},
 {"__kind__": "Person",
  "__key__": ["Person", 3],
  "born": "1985-03-03T00:00:00",
  "first_name": "Jane",
  "last_name": "Doe",
  "children": []}]

john2x avatar Feb 02 '15 12:02 john2x

Not sure. The original idea I had was to use the "__children__" key with a list of dicts as its value. This way, an ancestor could have any number of direct descendants.

Once you save an entity to the datastore, you cannot change its ancestor. If we want to be able not to specify a key, we need it .put() (and a key generated) before we .put() the children.

rbanffy avatar Feb 02 '15 17:02 rbanffy

The original idea I had was to use the "children" key with a list of dicts as its value. This way, an ancestor could have any number of direct descendants.

Ah right, that handles the multiple descendants scenario. Hmm but how would entities that share the same KeyProperty references be defined?

If we want to be able not to specify a key, we need it .put() (and a key generated) before we .put() the children.

Ah yes, that would be one of the requirements with my proposal, that keys need to be explicitly specified if you want to use it as a parent/child somewhere (which for me isn't so bad, since keys are easy enough to define).

Also, it helps keep your fixtures clean by allowing to keep entities of different kinds in separate fixture files. e.g.:

person.json
----------------
[{"__kind__": "Person",
  "born": "1968-03-03T00:00:00",
  "first_name": "John",
  "last_name": "Doe",
  "pets": [["Pet", "spot"]]}]

pet.json
----------
[{"__kind__": "Pet",
  "__key__": ["Pet", "spot"],
  "name": "Spot",
  "species": "dog"}]

Downsides though is that it would only work with NDB and would introduce backward incompatible changes.

Thanks!

john2x avatar Feb 03 '15 00:02 john2x

I have no big problem with being backward-incompatible. This tool is very new and not many people are using it.

I'm not sure of what you meant by "how would entities that share the same KeyProperty references be defined?". Using the same syntax we use for non-ancestor references, we simply have a list of objects (that could be of multiple kinds) under the __children__ key.

There are reasons to keep different kinds on different files, but having explicit identifiers referring data on different files will introduce inter-file dependencies - it would make no sense to import the children if you do not import their parents before.

rbanffy avatar Feb 04 '15 22:02 rbanffy

I'm not sure of what you meant by "how would entities that share the same KeyProperty references be defined?"

I mean something like this:

[
 {"__kind__": "Person",
  "first_name": "John",
  "last_name": "Doe",
  "__children__pets__": [{
    "__kind__": "Pet",
    "name": "Spot"
  }]},
 {"__kind__": "Person",
  "first_name": "Jane",
  "last_name": "Doe",
  "__children__pets__": [{  // would create a separate Pet for "Spot"?
    "__kind__": "Pet",
    "name": "Spot"
  }]},
]

john2x avatar Feb 05 '15 00:02 john2x

You are right. In this case you'd have to model Spot as the parent Pet and John and Jane as the __children__ Person's, but it seems there is a space for both methods.

In any case, let's, for now, include a __key__ JSON key that lets you manually define the key. We could then have a __parent__ that would take kind name and key and make an ndb.Key out of them. To keep it simple, I'd go with your idea of 2-item lists instead of a dict.

This way we'd have::

[
    {"__kind__": "Person",
    "__key__": "jdoe",
    "first_name": "John",
    "last_name": "Doe"
    },
   {"__kind__": "Person",
    "__parent__": ["Person", "jdoe"],
    "first_name": "Jane",
    "last_name": "Doe",
    },
]

equivalent to:

john = Person(id = 'jdoe', first_name = 'John', last_name = 'Doe').put()
jane = Person(parent = ndb.Key('Person', 'jdoe'), first_name = 'Jane', 
    'last_name' = 'Doe').put()

Maybe it'd be better to rename __key__ to __id__ to be less confusing to someone familiar with ndb.

rbanffy avatar Feb 05 '15 12:02 rbanffy

Yes, having __key__, __id__ and __parent__ would bring it closer to parity with the model constructor.

it seems there is a space for both methods.

I agree. So for scenarios where a nested definition of KeyProperty references is a better fit, the __children__some_property__ form will do. And for other scenarios where it doesn't fit (as with my use-case), specifying keys to some_property will also work.

(Would the __children__ form take precedence if/when both methods are used in a fixture (e.g. by mistake)?)

john2x avatar Feb 05 '15 14:02 john2x

Since __children__ defines entities whose parent is the current entity and __parent__ points to a specific entity that may or may not be in the same imported file (or exist) through a key generated from a parent kind and an id, it would not make sense to declare a __parent__ key on an object that's under a __children__ key. Manually declaring __key__ should be harmless.

We should probably throw an error.

rbanffy avatar Feb 06 '15 02:02 rbanffy

Hello! I pushed some updates (rebased from recent master).

So with these changes, __key__, __parent__ and __id__ are now valid ways to define an entity's key.

Also added support for __children__, but it's not as elegant as I wanted it to be, as I had to re-create the object with a new key and delete the old one (https://github.com/rbanffy/appengine-fixture-loader/pull/3/files#diff-afdeeb01fcabc03e6ad8a633b948d6a5R74). Maybe there's a better way?

Somewhat off-topic, but one thing I'm confused about the __children__key_property__ form. I just realized that it's not very intuitive.

e.g. for this definition

{
   "__kind__": "Person",
   "name": "John"
   "__children__owner__": [
     {
       "__kind__": "Dog",
       "name": "Fido"
     }
   ]
}

At a glance, one would think that "John" has an "owner" KeyProperty with value of "Fido"'s key. But in fact it is the reverse, it is "Fido" that has an "owner" KeyProperty with "John" as the value.

john2x avatar Feb 08 '15 03:02 john2x

We can avoid the delete-and-recreate-with-key issue by parsing the whole JSON into an in-memory list and creating all objects from the top-down (rather than bottom-up as json.load seems to do) so that keys are always available by the time you create the children. I've been reluctant to do this because simply invoking json.load is very concise and I have a somewhat misguided preference for loading things as they are needed rather than doing so upfront. This is a rather foolish take on the problem because these JSON files are not meant to become too large (even though I am using them to store initial application state).

I'll give that top-down approach a spin. From there, it should be easy to hook up a simpler version of your code without the create-delete-recreate part.

rbanffy avatar Feb 08 '15 12:02 rbanffy

develop branch is almost there. I'm ashamed to have added partial support to __children__ without proper tests but your tests should cover it nicely. Support for __key__ is also still missing, but should be easy to add. When it's implemented, your tests should pass.

rbanffy avatar Feb 09 '15 00:02 rbanffy

Just made some additions. __id__ is the name of the attribute. __parent__ is not done yet.

rbanffy avatar Feb 12 '15 00:02 rbanffy