elasticsearch-py
elasticsearch-py copied to clipboard
Add option to `bulk` indexer to automatically add nested mapping where appropriate
I am using bulk to index this JSON dict:
{
"project.nested.repeated.addresses":[
{
"status":"current",
"city":"New York",
"zip":33333,
"state":"NY",
"address":"789 Any Avenue",
"numberOfYears":2
},
{
"status":"previous",
"city":"Hoboken",
"zip":44444,
"state":"NJ",
"address":"321 Main Street",
"numberOfYears":3
}
],
"project.nested.repeated.first_name":"Jane",
"project.nested.repeated.last_name":"Doe"
}
I would like the addresses field to be nested in the Elasticsearch sense.
I have code that manually crawls the schema and adds nested mappings.
It would be nice if I didn't have to manually put nested everywhere. This seems like a common use case:
If there is a nested JSON field, make the corresponding Elasticsearch field nested.
@melissachang As there are significant performance consequences of making this choice, all users would not want to have it forced on them. Put another way, the core product exposes the option to the user to use nested or not use nested. For a low-level language client to remove that choice would be peculiar.
This client tries to remain as faithful as possible to the behavior of Elasticsearch, so it exposes the choice in the same way. If mapping an inner object field as nested is desirable, the user can explicitly state that in a mapping command. Otherwise, the default is not nested. Flipping that behavior specifically for users of a specific language client would be "astonishing"; that is, suddenly they get this quirky behavior they don't get from Elasticsearch.
For your specific situation, you might wish to explore index templates, including dynamic (field mapping) templates. Basically, you could put in your cluster rules that say "if you encounter an object field, map it as nested".