migrate
migrate copied to clipboard
Elasticsearch schema versioning?
Is your feature request related to a problem? Please describe. Hi thanks for this lib! I wonder whether something similar can be done on Elasticsearch schemas. I know it is not a traditional "sql"-like database, but it also has schemas (and the system uses it very strongly, for instance, putting data with wrong format will throw an error). IMHO this project is very flexible since it contains various drivers. So I wonder whether this can be done? Thanks!
Describe the solution you'd like N/A Describe alternatives you've considered N/A Additional context N/A
Are you referring to ECS? I'm not very familiar with that. How would you implement migrations? e.g. how would modifications to the "schema" be represented? Would each new migration overwrite the previous one?
@dhui Sorry I am also not an expert of ES so I made the name wrong. I mean the "mappings" (just the "schema" in ES's world). https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html#mappings
For example, I do this
PUT /test
{
"mappings": {
"properties": {
"field1": { "type": "text" }
}
}
}
Then I can only put documents like {"field1": "my_text"}, but cannot put something like {"field1": 123} since the latter is number instead of text.
ES does support some kind of "migration". We can PUT again later, e.g.
PUT /test
{
"mappings": {
"properties": {
"field1": { "type": "text" },
"field2": { "type": "number" }
}
}
}
However, that feature is limited, and cannot change the type of a field (if I remember correctly).
So, about "How would you implement migrations": I would suggest just do nothing but make a curl (http post) to elasticsearch. For example, users define 1_create_test.up.sql (though it is json, not sql)
{
"mappings": {
"properties": {
"field1": { "type": "text" }
}
}
}
and the system just run http PUT onto the es server.
Looks like there's a separate API for updating an index's mappings. I'd avoid using HTTP directly. How were you going to handle authentication? You're probably better off using the official client which has methods for index create and update index mappings. How would you differentiate between the 2 API calls in a migration file? Is there a standard format or DDL for ElasticSearch?. I do not want to create our own.
Oh github ate my reply! So I have to type it again :(
I'd avoid using HTTP directly. How were you going to handle authentication? You're probably better off using the official client which has methods for index create and update index mappings.
The official client sounds great!
How would you differentiate between the 2 API calls in a migration file? Is there a standard format or DDL for ElasticSearch?. I do not want to create our own.
IMHO there is no specially formatted DDL. When I read official doc, they always use JSON, or curl-like commands. (Just like the json I show above).
I may suggest, e.g.
{
"action": "CREATE",
"payload": {
"mappings": {
"properties": {
"field1": { "type": "text" }
}
}
}
}
Yes that is not 100% elasticsearch format, but since es uses nothing but plain JSON, it sounds like 99% es format :P
Answering your original question:
So I wonder whether this can be done?
It looks it's possible to create a migrate driver to manage ElasticSearch indices. Feel free to create and maintain such a driver. However, unless there's a standard (official or community accepted/widely-used) DDL for managing ElasticSearch indices, migrate won't officially support an ElasticSearch driver.
@dhui Thanks very much! However, imho there is no such standard DDL, so it means that migrate will never support ES? :(
Correct, migrate won't support ElasticSearch until there's a DDL, since creating a DDL is out of the scope for this project
Hmm what about this: Use the "curl-like" format that has been seen everywhere in es official doc. It can be viewed as a DDL, since es has used it a thousand times and es's web UI (kibana) even has a dedicated panel to execute such kind of language :)
e.g.
PUT /my-index-000001
{
"mappings": {
"properties": {
"age": { "type": "integer" },
"email": { "type": "keyword" },
"name": { "type": "text" }
}
}
}
(seen here)
A format or spec is not a DLL.
Does Kibana manage ElasticSearch indices? Using a few APIs to query ES is not the same thing as managing an index.
Why not maintain your own driver?
@dhui If my driver will not be officially supported in migrate then nobody will know or use it, so sounds not interesting :/ Of course Kibana manages ES indices.
It doesn't look like Kibana manages indices. Kibana Index Management looks like a UI that just wraps the ElasticSearch index APIs. Were you referring to something else in Kibana?
If my driver will not be officially supported in migrate then nobody will know or use it, so sounds not interesting
If this is an actual problem with actual users, then people will be able to find it and use it. How do you think large open source projects gained traction?
@dhui Yes I refer to that link (and other parts but all are similar).
If this is an actual problem with actual users, then people will be able to find it and use it. How do you think large open source projects gained traction?
OK I will try to do so :)
Hey @dhui I came across this thread, and was thinking about this a bit more.
You mentioned not creating a DDL here and I certainly understand that. I also looked a bit at the Go library to see if there was say some official thing we could hook into like runCommand in mongo.
So I thought I'd raise this again, we have had a lot of success with golang migrate with Mongo and Postgres, and now would like to adapt it for ELS.
In terms of DDL, I wanted to ask or maybe revisit HTTP. The format ELS uses is, I think, a stripped down version of HTTP. I think the only difference between a random command, say (https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html#create-index-settings)
PUT /my-index-000001
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2
}
}
}
and a very bare bones HTTP Request is:
- A requirement to include the HTTP Version (in theory you don't need it with
GET) according to the RFC. - The requirement of an empty line between the headers and the body.
So I think (maybe)
PUT /my-index-000001 HTTP/1.1
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2
}
}
}
Would be a valid request for ELS, I'm not 100% sure whether the Content-Length header is required, but we could just assume other things or add other headers as needed (i.e., Content-Type).
I thought I would run this by you for your thoughts. We could do authentication out of band, and add headers as needed based on the DSN we take (i.e., a host header). Doing this at the HTTP level also has the advantage of probably avoiding any issues with OpenSearch vs ELS.
The migrate FAQ states:
Can I maintain my driver in my own repository?
Yes, technically thats possible. We want to encourage you to contribute your driver to this repository though. The driver's functionality is dictated by migrate's interfaces. That means there should really just be one driver for a database/ source. We want to prevent a future where several drivers doing the exact same thing, just implemented a bit differently, co-exist somewhere on GitHub. If users have to do research first to find the "best" available driver for a database in order to get started, we would have failed as an open source community.
I'm certainly happy to maintain a fork, but based on that I thought I'd raise it here and see what your thoughts were.
Many operations in ELS take query parameters that control success or failure, so you can when creating an index just specify things like timeouts and the request will fail.