migrate icon indicating copy to clipboard operation
migrate copied to clipboard

Elasticsearch schema versioning?

Open fzyzcjy opened this issue 5 years ago • 13 comments

Is your feature request related to a problem? Please describe. Hi thanks for this lib! I wonder whether something similar can be done on Elasticsearch schemas. I know it is not a traditional "sql"-like database, but it also has schemas (and the system uses it very strongly, for instance, putting data with wrong format will throw an error). IMHO this project is very flexible since it contains various drivers. So I wonder whether this can be done? Thanks!

Describe the solution you'd like N/A Describe alternatives you've considered N/A Additional context N/A

fzyzcjy avatar Oct 23 '20 07:10 fzyzcjy

Are you referring to ECS? I'm not very familiar with that. How would you implement migrations? e.g. how would modifications to the "schema" be represented? Would each new migration overwrite the previous one?

dhui avatar Oct 23 '20 21:10 dhui

@dhui Sorry I am also not an expert of ES so I made the name wrong. I mean the "mappings" (just the "schema" in ES's world). https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html#mappings

For example, I do this

PUT /test
{
  "mappings": {
    "properties": {
      "field1": { "type": "text" }
    }
  }
}

Then I can only put documents like {"field1": "my_text"}, but cannot put something like {"field1": 123} since the latter is number instead of text.

ES does support some kind of "migration". We can PUT again later, e.g.

PUT /test
{
  "mappings": {
    "properties": {
      "field1": { "type": "text" },
      "field2": { "type": "number" }
    }
  }
}

However, that feature is limited, and cannot change the type of a field (if I remember correctly).


So, about "How would you implement migrations": I would suggest just do nothing but make a curl (http post) to elasticsearch. For example, users define 1_create_test.up.sql (though it is json, not sql)

{
  "mappings": {
    "properties": {
      "field1": { "type": "text" }
    }
  }
}

and the system just run http PUT onto the es server.

fzyzcjy avatar Oct 24 '20 00:10 fzyzcjy

Looks like there's a separate API for updating an index's mappings. I'd avoid using HTTP directly. How were you going to handle authentication? You're probably better off using the official client which has methods for index create and update index mappings. How would you differentiate between the 2 API calls in a migration file? Is there a standard format or DDL for ElasticSearch?. I do not want to create our own.

dhui avatar Oct 24 '20 05:10 dhui

Oh github ate my reply! So I have to type it again :(

I'd avoid using HTTP directly. How were you going to handle authentication? You're probably better off using the official client which has methods for index create and update index mappings.

The official client sounds great!

How would you differentiate between the 2 API calls in a migration file? Is there a standard format or DDL for ElasticSearch?. I do not want to create our own.

IMHO there is no specially formatted DDL. When I read official doc, they always use JSON, or curl-like commands. (Just like the json I show above).

I may suggest, e.g.

{
  "action": "CREATE",
  "payload": {
    "mappings": {
        "properties": {
        "field1": { "type": "text" }
      }
    }
  }
}

Yes that is not 100% elasticsearch format, but since es uses nothing but plain JSON, it sounds like 99% es format :P

fzyzcjy avatar Oct 24 '20 11:10 fzyzcjy

Answering your original question:

So I wonder whether this can be done?

It looks it's possible to create a migrate driver to manage ElasticSearch indices. Feel free to create and maintain such a driver. However, unless there's a standard (official or community accepted/widely-used) DDL for managing ElasticSearch indices, migrate won't officially support an ElasticSearch driver.

dhui avatar Oct 24 '20 22:10 dhui

@dhui Thanks very much! However, imho there is no such standard DDL, so it means that migrate will never support ES? :(

fzyzcjy avatar Oct 24 '20 23:10 fzyzcjy

Correct, migrate won't support ElasticSearch until there's a DDL, since creating a DDL is out of the scope for this project

dhui avatar Oct 25 '20 03:10 dhui

Hmm what about this: Use the "curl-like" format that has been seen everywhere in es official doc. It can be viewed as a DDL, since es has used it a thousand times and es's web UI (kibana) even has a dedicated panel to execute such kind of language :)

e.g.

PUT /my-index-000001
{
  "mappings": {
    "properties": {
      "age":    { "type": "integer" },  
      "email":  { "type": "keyword"  }, 
      "name":   { "type": "text"  }     
    }
  }
}

(seen here)

fzyzcjy avatar Oct 25 '20 08:10 fzyzcjy

A format or spec is not a DLL.

Does Kibana manage ElasticSearch indices? Using a few APIs to query ES is not the same thing as managing an index.

Why not maintain your own driver?

dhui avatar Oct 25 '20 19:10 dhui

@dhui If my driver will not be officially supported in migrate then nobody will know or use it, so sounds not interesting :/ Of course Kibana manages ES indices.

fzyzcjy avatar Oct 25 '20 23:10 fzyzcjy

It doesn't look like Kibana manages indices. Kibana Index Management looks like a UI that just wraps the ElasticSearch index APIs. Were you referring to something else in Kibana?

If my driver will not be officially supported in migrate then nobody will know or use it, so sounds not interesting

If this is an actual problem with actual users, then people will be able to find it and use it. How do you think large open source projects gained traction?

dhui avatar Oct 26 '20 02:10 dhui

@dhui Yes I refer to that link (and other parts but all are similar).

If this is an actual problem with actual users, then people will be able to find it and use it. How do you think large open source projects gained traction?

OK I will try to do so :)

fzyzcjy avatar Oct 26 '20 03:10 fzyzcjy

Hey @dhui I came across this thread, and was thinking about this a bit more.

You mentioned not creating a DDL here and I certainly understand that. I also looked a bit at the Go library to see if there was say some official thing we could hook into like runCommand in mongo.

So I thought I'd raise this again, we have had a lot of success with golang migrate with Mongo and Postgres, and now would like to adapt it for ELS.

In terms of DDL, I wanted to ask or maybe revisit HTTP. The format ELS uses is, I think, a stripped down version of HTTP. I think the only difference between a random command, say (https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html#create-index-settings)

PUT /my-index-000001
{
  "settings": {
    "index": {
      "number_of_shards": 3,  
      "number_of_replicas": 2 
    }
  }
}

and a very bare bones HTTP Request is:

  1. A requirement to include the HTTP Version (in theory you don't need it with GET) according to the RFC.
  2. The requirement of an empty line between the headers and the body.

So I think (maybe)

PUT /my-index-000001 HTTP/1.1

{
  "settings": {
    "index": {
      "number_of_shards": 3,  
      "number_of_replicas": 2 
    }
  }
}

Would be a valid request for ELS, I'm not 100% sure whether the Content-Length header is required, but we could just assume other things or add other headers as needed (i.e., Content-Type).

I thought I would run this by you for your thoughts. We could do authentication out of band, and add headers as needed based on the DSN we take (i.e., a host header). Doing this at the HTTP level also has the advantage of probably avoiding any issues with OpenSearch vs ELS.

The migrate FAQ states:

Can I maintain my driver in my own repository?

Yes, technically thats possible. We want to encourage you to contribute your driver to this repository though. The driver's functionality is dictated by migrate's interfaces. That means there should really just be one driver for a database/ source. We want to prevent a future where several drivers doing the exact same thing, just implemented a bit differently, co-exist somewhere on GitHub. If users have to do research first to find the "best" available driver for a database in order to get started, we would have failed as an open source community.

I'm certainly happy to maintain a fork, but based on that I thought I'd raise it here and see what your thoughts were.

Many operations in ELS take query parameters that control success or failure, so you can when creating an index just specify things like timeouts and the request will fail.

SJrX avatar Jan 16 '24 15:01 SJrX