elasticsearch-knapsack
elasticsearch-knapsack copied to clipboard
RoutingMissingException for import of child documents
Importing child documents results in an org.elasticsearch.action.RoutingMissingException, since the parent id is required for POSTing child documents. Perhaps you could use some directory convention on export whereby all child records are written to the directory of their parent or something like that.
Import/export of parent/child is a bit complex. First, the parents have to be exported and imported, then the children. Mixed import/export will lead to the missed routings.
Actually, you can import the child docs before or without the parent docs without ES throwing an exception.
Regardless, our ingest processes use a directory convention where by we create the parent dir using the parent id and then create both the parent and child docs in that dir so that we can index the parent first and then it's children. You could do something similar.
A. Steven Anderson On Oct 26, 2014 5:12 AM, "Jörg Prante" [email protected] wrote:
Import/export of parent/child is a bit complex. First, the parents have to be exported and imported, then the children. Mixed import/export will lead to the missed routings.
Reply to this email directly or view it on GitHub https://github.com/jprante/elasticsearch-knapsack/issues/58#issuecomment-60511232 .
I'm also getting this problem . so can I assume this issue is still open ?
I do not fully understand the issue. Maybe because I do not use parent/child.
The import/export of parent/child is complex and requires two exports and imports.
- Export of parents
- Import of parents
- Export of children
- Import of children
I can not see where I can use a "directory convention" and how this should be different from current export/import. Maybe I'm missing something completely.
The solution is simple. It just comes down to a standard file naming convention for child records which indicate their parent's _id which is required for the child ingest request. e.g. we use the convention that child records are stored in a sub-directory of the parent directory, so that we can determine the parent _id, which is the name of the parent directory. However, you could also use some other child file naming convention like prefixing the child file name with the parent _id or something similar.
Having exactly the same issue atm.
@jprante: I'll give your suggestion of multiple steps a shot but my concern is: I cannot see the _parent field in my archived data and I am afraid without having _parent referencing to _id of our parent, we cannot index any child document
e.g. parent doc _id = 1111111111 w/ child doc _id = 9999999999 and parent doc _id = 2222222222 w/ no child docs.
So, you could export them to something like the following directory structure:
myindex/
1111111111/
9999999999.json
1111111111.json
2222222222.json
So, when you go to index the child records, you can get the _id of the parent from it's file path.
Alternatively, you could just use a file naming convention w/ a delimiter between parent and child ids if you don't want to use subdirectories.
myindex/
1111111111.json
1111111111_9999999999.json
2222222222.json
@asanderson: in our case the children are located in different types (one type containing the parent and multiple types containing children)
However, your idea could still work if we can assure that any indices not having a parent mapping get restored first.
@jprante: your suggestion of using multiple steps seems not to work out because _parent field is located at the same level as _id and not part of the archived child type (at least not in mine)
I'm not sure what you mean. Are you saying, you cannot determine the _parent field for a child document during an export?
Well, actually my example above is an over-simplification. We do have support multiple types. e.g.
myindex1/
mytype1/
1111111111/
9999999999.json
1111111111.json
2222222222.json
And, we index all the parents first and then all the children; i.e. breadth-first traversal.
Seems I am missing the point... How do you do that? For what I see, knappsack ignores the _parent field currently
@marbleman Well, that's sort of the point of this issue; i.e. fully support export/import of parent & child documents.
Ah ok... you sounded to me as if you have already solved this issue.
To sum up: a structure like
myindex1/
mytype1/
1111111111/
9999999999.json
1111111111.json
2222222222.json
where 1111111111 is the value of _parent off document 9999999999 will to the trick during export. During import types without having a _routing and/or _parent definition in their mapping must be imported first. For any child type _parent can then be retrieved from the folder name.
Does not sound too complex but I am not a Java developer... To counterbalance I could donate a box of good wine ;)
Just to clarify, this is the scheme our bulk ingest uses for our system; i.e. our various data sources are extracted/transformed/loaded (ETL) into .json files in this directory structure for our bulk ingest process to read them and send them to Elasticsearch. However, it would be great to have knapsack implement a similar scheme to support parent/child docs.
Seems I have to document better how to export/import meta fields like _parent
, beside _source
. It's related to the syntax of an ES query.
Hi all, is this supported in 2.1.1.0? AFAIK, I'm following the necessary steps to import child documents (e.g. the generated export file has _source, _parent, _routing entries) but the import is failing with the following:
index [myindex], type [mytype], id [3059310], message [[myindex] RoutingMissingException[routing is required for [myindex]/[mytype]/[3059310]]]
don't miss the steps to EXPORT the child documents correctly! If you don't add the _parent field to the export with the described query, you cannot import it. If you inspect the archive, you should see a _parent folder next to the _source folder containing the data for each document.
What I'm seeing is a _parent, _source, and _routing file rather than a folder inside each document folder. Again, I'm using v 2.1.1.0, in case it matters when I run an ls under the index and type the results are the following
index/type/id1/_source
index/type/id1/_parent
index/type/id1/_routing
index/type/id2/_source
index/type/id2/_parent
index/type/id2/_routing
etc...
and if i cat the files the _source file has the document json and the _parent and _routing files simply have the parent id in them
That seems to be correct. Make sure your import the parent type first
I have imported the parent type and am having no problem indexing child documents manually, however the import continues to fail