elasticsearch-knapsack icon indicating copy to clipboard operation
elasticsearch-knapsack copied to clipboard

Unable to do a full import from file (ES 1.3.4)

Open AtzeDeVries opened this issue 8 years ago • 8 comments

Hi,

I'm trying to export the from my es server (about 22GB, 100K documents, 1 index) to a file. The following situations happen.

  • If i create a tar.gz it stops after a 2GB file. Importing it results into 4GB elasticsearch data.
  • If i create a bulk.gz it creates about 1.8GB of data, importing results to 23GB and only 14K documents
  • If i _push from one server to other server it works correctly.

I would like to have all the data in a file, since it is portable.

Command to export:

curl -XPOST 'localhost:9200/_export?path=/data/elasticsearch_export/nda_export.bulk.gz'

The are two clusters. cluster A containg 1 node, and cluster B containing 3 nodes. I'm trying to move data from A to B.

download link of plugin is http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-knapsack/${es_version}.0/elasticsearch-knapsack-${es_version}.0-plugin.zip where $es_version is 1.3.4

AtzeDeVries avatar Jan 16 '16 12:01 AtzeDeVries

I forgot to upload 1.3.4.1 in October. Now it's there. Can you try 1.3.4.1 to check if the problems persist? Thanks.

http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-knapsack/1.3.4.1/

jprante avatar Jan 16 '16 21:01 jprante

Hi,

So i've did a lot of testing, but found the solution. The mapping was not transfered (or not correctly transfered) to the new server if you move the data via a file. If i inject the mapping before the _import that it seems to work fine (the export is a bulk.gz of one index).

AtzeDeVries avatar Jan 19 '16 13:01 AtzeDeVries

Yes. The bulk archive is not able to transport mappings. The ES bulk format has no mechanism for creating mappings, only for document indexing.

jprante avatar Jan 19 '16 14:01 jprante

ok, than stil the issue of 'non' bulk exports only begin 2GB is still standing. I did not try to export it to a tar file instead of tar.gz. I did test it to breakup in multipe files, but the total of tar.gz multiple files was 2GB

2016-01-19 15:06 GMT+01:00 Jörg Prante [email protected]:

Yes. The bulk archive is not able to transport mappings. The ES bulk format has no mechanism for creating mappings, only for document indexing.

— Reply to this email directly or view it on GitHub https://github.com/jprante/elasticsearch-knapsack/issues/93#issuecomment-172863421 .

AtzeDeVries avatar Jan 19 '16 14:01 AtzeDeVries

Yes, I checked. The fix was not backported.

If you can build form source, here is a quick fix:

Set longFileMode in this line

https://github.com/jprante/elasticsearch-knapsack/blob/1.3/src/main/java/org/xbib/io/archive/tar/TarArchiveOutputStream.java#L84

to LONGFILE_GNU

jprante avatar Jan 19 '16 14:01 jprante

so it is only a problem with tar files? Then could just use .zip which is fine be me. (i can't test at the moment, since the testing server is runnig a different job)/.

AtzeDeVries avatar Jan 19 '16 14:01 AtzeDeVries

Yes, it's a tar format peculiarity, the original tar is limited to 2GB, while POSIX TAR or GNU TAR is not.

jprante avatar Jan 19 '16 14:01 jprante

Ok, then i'll try the zip method tomorrow. I'll report back on that

2016-01-19 15:50 GMT+01:00 Jörg Prante [email protected]:

Yes, it's a tar format peculiarity, the original tar is limited to 2GB, while POSIX TAR or GNU TAR is not.

— Reply to this email directly or view it on GitHub https://github.com/jprante/elasticsearch-knapsack/issues/93#issuecomment-172875913 .

AtzeDeVries avatar Jan 19 '16 14:01 AtzeDeVries