spring-data-elasticsearch icon indicating copy to clipboard operation
spring-data-elasticsearch copied to clipboard

[PROPOSAL] Introduce `spring-data-opensearch` to support OpenSearch integration

Open reta opened this issue 2 years ago • 22 comments

Picking up the discussion started in https://github.com/spring-projects/spring-data-elasticsearch/issues/1770, at this moment it becomes clear that both projects are going to evolve in (slightly) incompatible fashion. Taking into account how widely popular Spring Data Elasticsearch is, it is highly likely that community would seek the first class support of the OpenSearch integration as well.

In scope of this issue, it would be great to hear the feedback on the proposal to introduce dedicated spring-data-opensearch project so to have OpenSearch supported out of the box, not depending on Elasticsearch. @sothawo if there is an interest and willing to accept the contribution, I would be more than happy to work on the pull request.

The OpenSearch project just published the Maven artifacts for client libraries [1], there should be no issues to rely on those for spring-data-opensearch and disconnect from Elasticsearch.

Thank you very much.

[1] https://discuss.opendistrocommunity.dev/t/maven-repository-artifacts-for-plugin-development/6406/7]

reta avatar Sep 02 '21 17:09 reta

Linking in the prior ticket that discussed next generation clients. opensearch-project/OpenSearch#1853. I too am interested if there has been any traction related to that ticket yet.

wboyle-erwin avatar Sep 07 '21 21:09 wboyle-erwin

The current plan for the further development of Spring Data Elasticsearch (as already mentioned in https://github.com/spring-projects/spring-data-elasticsearch/issues/1880#issuecomment-885775606 and https://github.com/spring-projects/spring-data-elasticsearch/issues/1853):

The primary search engine that Spring Data Elasticsearch targets is Elasticsearch.

Elasticsearch prepares a new client (https://github.com/elastic/elasticsearch-java) that will replace the existing RestHighLevelClient, because this references many classes from the Elasticsearch core libraries. Once this is available, Spring Data Elasticsearch will switch to use this new client.

In order to prepare for this, we internally will change the architecture to separate the code that accesses Elasticsearch - using some client library - from the code that does the Spring Data Elasticsearch logic. We will then have two implementations, one using the current Elasticsearch client and one that will use the new Elasticsearch client. We need to have the internal restructuring to be able to develop the integration of the new client while still using the old one, this task is already started. The final decision how this interface/API between non-client related and client-related code will look like is still open.

As for integration of OpenSearch: once we have that separation of client related code it will be possible add an implementation that uses an OpenSearch client library.

I can't tell you when this code separation for the two clients will be done, one thing is, that the new Elasticsearch client is not yet available. The other point is that Spring Data Elasticsearch is a community driven and maintained project, there are no developers working on that full-time. I maintain the project in cooperation with Spring Data maintainers to keep it aligned with other Spring Data modules (doing that in my spare time besides my day time job).

As for a dedicated spring-data-opensearch project: I will not have the time to work on this in addition to my work on Spring Data Elasticsearch, but it should be no problem to create a fork and switch that to use the OpenSearch RestHighLevelClient. Running the tests against an OpenSearch container only had a some tests failing where Spring Data Elasticsearch uses functionality that wasn't available in Spring Data Elasticsearch 7.10.

sothawo avatar Sep 09 '21 05:09 sothawo

@sothawo thank you very much for laying out the plans and sharing your thoughts, much appreciated. I would be happy to help out with introducing (and surely help to maintain) spring-data-opensearch when the time comes. On the separation of the client related code subject, do you need any help at this stage (so to get prepared for the client switch over)?

Also, do you think we should keep this issue open or consolidate everything under https://github.com/spring-projects/spring-data-elasticsearch/issues/1853? Thank you very much.

reta avatar Sep 09 '21 12:09 reta

opensearch-project/OpenSearch#1853 is about the internal refactoring, this will not yet contain a possibility to use an OpenSearch client. When this is done, I will need to see what clients are available at that time, and adding these a an additional way to use Spring Data Elasticsearch will then probably be new tickets.

sothawo avatar Sep 09 '21 15:09 sothawo

Eventually the products will diverse even further (for example we'd like to enable security by default in OpenSearch core), so maybe it's easier to actually fork spring-data-elasticsearch into spring-data-opensearch as proposed, instead of trying to accommodate both clients in the same codebase?

dblock avatar Feb 03 '22 18:02 dblock

I am currently busy with adapting to the new Elasticsearch client in Spring Data Elasticsearch, other contributions to the project are rare, so adapting to opensearch will have to wait until I find time to integrate (I am doing this in my spare time as well).

If anybody wants to fork and adapt the project for Opensearch, that's fine, I don't have the time to maintain and manage a different project.

sothawo avatar Feb 04 '22 07:02 sothawo

@sothawo I kinda liked the direction you pointed out:

we internally will change the architecture to separate the code that accesses Elasticsearch - using some client library - from the code that does the Spring Data Elasticsearch logic.

Forking is always the option but Spring Data Elasticsearch already has a pretty good abstraction layer over Elasticsearch. In the quick prototype I did, the substantial amount of abstractions could me moved out of Spring Data Elasticsearch to, let say, new module Spring Data Common Search (somewhat similar to Spring Data Common). The Spring Data Elasticsearch and Spring Data Opensearch (hypothetically) are going to be built on top of it. This is not exactly what you have suggested, yet another option.

The risk here, of cause, is that the products would indeed diverse even further, as @dblock rightly mentioned, so it is highly likely the layer of common abstraction will become thinner and thinner with time.

Anyway, if you see it is worth exploring the idea of common search module, I would be happy to help: there would be no breaking changes, a few refactorings but that is about it. Also, I don't want to mess up with the migration to the new Elasticsearch Java client you are going through now, so open to any suggestions you may have.

Thank you.

reta avatar Feb 04 '22 21:02 reta

I had a talk with the people from VMware maintaining Spring Data last week.

  • there will be no new library, artifact or project maintained by Spring Data
  • support for connectivity to OpenSearch should be added to Spring Data Elasticsearch
  • as for the next steps / timeline
    • the next thing to do is to add the new Elasticsearch client as an alternative in the next version of Spring Data Elasticsearch (4.4) without changing the package structure of existing code using the RHLC. ELC code lives in it's own package
    • Breaking changes will come with Spring Data Elasticsearch 5 (Spring Data 3, Spring 5, Java 17 etc). This will then probably use Elasticsearch 8 which has no RHLC anymore. For this release, the existing code using the 7.17. RHCL can be moved to a distinct package next to the code for the new client.
    • the different packages for ELC and RHLC code can then be used as templates for the Opensearch integration

sothawo avatar Feb 12 '22 10:02 sothawo

Hi,

Wouldn't it make sense to open 3 issues to track these 3 steps, so we can work on them accordingly?

I prefer to ask before doing so, as I'm a bit unsure on the correct procedure

Heatmanofurioso avatar Mar 22 '22 12:03 Heatmanofurioso

The first point I am working on for some time now (#1973). The first incomplete implementation is added in the 4.4.M4 that was released yesterday. This code is far from being complete, there are still many todos and unimplemented parts. My idea was that this code - in the package org.springframework.data.elasticsearch.client.elc can be used as a template for a package like org.springframework.data.elasticsearch.client.osc which would use the OpenSearch version of the new Elasticsearch client. But the work is still heavy in progress, meaning that when this were added now, all changes and refactorings had to be taken over to the code using OpenSearch. I do not think that it makes sense to start this before the Elasticsearch part is done with version 4.4.

Refactoring the existing code that uses the RHLC will then be a separate ticket, there is no point in createing it already, as this work can only start after 4.4 is released and the new client integration is in place.

And the third would be a separate ticket as well, but I don't think it really makes sense to start that now.

sothawo avatar Mar 22 '22 17:03 sothawo

4.4 is out now.

Should we consider closing this ticket and open 2 other tickets to track the remaining changes in our proposed solution?

Heatmanofurioso avatar May 24 '22 18:05 Heatmanofurioso

Only solution is to go OpenSearch now Elastic has sabotaged the Spring Data Elastic Search project by making it unusable for most cases. Elastic has tainted the transitive dependencies of Spring Data Elastic Search with strong copy-left licenses that make any project and code using Spring Data Elastic Search become itself OSS.

I do love OSS but I love getting paid even more. So I need to develop proprietary software for my employers. I try to contribute to OSS when ever I can but the code I get paid for unfortunately needs to be proprietary.

I really liked the Elastic Search, but unfortunately the Elastic company has become toxic and they started sabotaging or booby-trapping all their software.

Spring Data Elastic Search is now tainted from Elastic with

  1. Server Side Public License, which is strong copy-left license
  2. Elastic License which forbids commercial use and retail of your code.

mikezerosix avatar May 25 '22 08:05 mikezerosix

4.4 has the code to use the new client as an option, not as default. The code for this resides in a separate package org.springframework.data.elasticsearch.client.elc. This integration is not yet complete due to some errors in the new Elasticsearch client, there are open tickets for that.

It was not yet possible to refactor the code using the old client into a separate package, because this is a breaking change and will be done in 5.0. (#2157). When this is done all that code will be in org.springframework.data.elasticsearch.client.rhlc; the default client used in Spring Data Elasticsearch will then be the new Elasticsearch client which is Apache2 licensed (#2159)

These packages can then be used a s a template for example for a package org.springframework.data.elasticsearch.client.osc containing the code using the OpenSearch version of the new client or something like org.springframework.data.elasticsearch.client.ohlc using the OpenSearch RestHighLevel client.

I'd rather add new tickets for this once the internal package structure is stable with version 5.0.

sothawo avatar May 25 '22 09:05 sothawo

Only solution is to go OpenSearch now Elastic has sabotaged the Spring Data Elastic Search project by making it unusable for most cases. Elastic has tainted the transitive dependencies of Spring Data Elastic Search with strong copy-left licenses that make any project and code using Spring Data Elastic Search become itself OSS.

No. The Elasticsearch clients are forward compatible — if you use the client for Elasticsearch 7.15, it expects the full 7.15 API (or newer); not an older version which misses endpoints or a different API. This has been causing more than enough issues and complaints in the past that we are clear about it from starting rather than having runtime issues later on. It just doesn't work.

If you use an old version of Elasticsearch, use that version of the client. You're not gaining anything from upgrading just the client. If you're using a different project with a diverging API, you'll need a client for that.

I do love OSS but I love getting paid even more. So I need to develop proprietary software for my employers. I try to contribute to OSS when ever I can but the code I get paid for unfortunately needs to be proprietary.

I really liked the Elastic Search, but unfortunately the Elastic company has become toxic and they started sabotaging or booby-trapping all their software.

We also like getting paid. We actually give most of our work away for free and still do. But others monetizing it (instead of us) has reached a point where it wasn't reasonable any more. Given your opening sentence here, I appreciate your understanding :)

Spring Data Elastic Search is now tainted from Elastic with

  1. Server Side Public License, which is strong copy-left license
  2. Elastic License which forbids commercial use and retail of your code.

No. We waived it for the now deprecated HLRC client and the new Java API client is Apache 2 licensed as announced last year.

xeraa avatar May 25 '22 10:05 xeraa

xeraa, You say that, but the truth is that adding latest artefact from maven :

org.springframework.data spring-data-elasticsearch 4.4.0

into Spring boot application pom.xml and running Maven license scan reveals SSP/Elastic Dual licenses in:

License: 'Elastic License 2.0' used by 13 dependencies: -server (org.elasticsearch:elasticsearch:7.15.2 - https://github.com/elastic/elasticsearch) -elasticsearch-cli (org.elasticsearch:elasticsearch-cli:7.15.2 - https://github.com/elastic/elasticsearch) -elasticsearch-core (org.elasticsearch:elasticsearch-core:7.15.2 - https://github.com/elastic/elasticsearch) -elasticsearch-geo (org.elasticsearch:elasticsearch-geo:7.15.2 - https://github.com/elastic/elasticsearch) -elasticsearch-plugin-classloader (org.elasticsearch:elasticsearch-plugin-classloader:7.15.2 - https://github.com/elastic/elasticsearch) -elasticsearch-secure-sm (org.elasticsearch:elasticsearch-secure-sm:7.15.2 - https://github.com/elastic/elasticsearch) -elasticsearch-x-content (org.elasticsearch:elasticsearch-x-content:7.15.2 - https://github.com/elastic/elasticsearch) -rest-high-level (org.elasticsearch.client:elasticsearch-rest-high-level-client:7.15.2 - https://github.com/elastic/elasticsearch) -aggs-matrix-stats (org.elasticsearch.plugin:aggs-matrix-stats-client:7.15.2 - https://github.com/elastic/elasticsearch) -lang-mustache (org.elasticsearch.plugin:lang-mustache-client:7.15.2 - https://github.com/elastic/elasticsearch) -mapper-extras (org.elasticsearch.plugin:mapper-extras-client:7.15.2 - https://github.com/elastic/elasticsearch) -parent-join (org.elasticsearch.plugin:parent-join-client:7.15.2 - https://github.com/elastic/elasticsearch) -rank-eval (org.elasticsearch.plugin:rank-eval-client:7.15.2 - https://github.com/elastic/elasticsearch)

License: 'Server Side Public License, v 1' used by 12 dependencies: -server (org.elasticsearch:elasticsearch:7.15.2 - https://github.com/elastic/elasticsearch) -elasticsearch-cli (org.elasticsearch:elasticsearch-cli:7.15.2 - https://github.com/elastic/elasticsearch) -elasticsearch-core (org.elasticsearch:elasticsearch-core:7.15.2 - https://github.com/elastic/elasticsearch) -elasticsearch-geo (org.elasticsearch:elasticsearch-geo:7.15.2 - https://github.com/elastic/elasticsearch) -elasticsearch-plugin-classloader (org.elasticsearch:elasticsearch-plugin-classloader:7.15.2 - https://github.com/elastic/elasticsearch) -elasticsearch-secure-sm (org.elasticsearch:elasticsearch-secure-sm:7.15.2 - https://github.com/elastic/elasticsearch) -elasticsearch-x-content (org.elasticsearch:elasticsearch-x-content:7.15.2 - https://github.com/elastic/elasticsearch) -aggs-matrix-stats (org.elasticsearch.plugin:aggs-matrix-stats-client:7.15.2 - https://github.com/elastic/elasticsearch) -lang-mustache (org.elasticsearch.plugin:lang-mustache-client:7.15.2 - https://github.com/elastic/elasticsearch) -mapper-extras (org.elasticsearch.plugin:mapper-extras-client:7.15.2 - https://github.com/elastic/elasticsearch) -parent-join (org.elasticsearch.plugin:parent-join-client:7.15.2 - https://github.com/elastic/elasticsearch) -rank-eval (org.elasticsearch.plugin:rank-eval-client:7.15.2 - https://github.com/elastic/elasticsearch)

Unless all of this is multi licensed with yet third license of Apache License v2.0. Is that what you are saying ? That all of this is now multi-licensed also with Apache v2.0 ?

That Spring Data Elastic and all of it's transitive dependencies are now licensed under Apache License v2.0, regardless of elastic packages also being multi-licensed with SSP aand Electic licenses ?

mikezerosix avatar May 25 '22 16:05 mikezerosix

Do we have a target timeline for Spring Data 5.0? Is there a way that the OpenSearch community can contribute?

brijos avatar Jun 10 '22 21:06 brijos

WARNING! I just got answer from Elastic from Jason Yujuico stating that aforementioned libraries used by Spring Data ElasticSearch in fact are dual licensed as SPP and Elastic License 2.0. I see that tainting the whole Spring Data Elastic and anything using it as either strong copy-left or requiring commercial license. He did not mention anything about them being available as Apache License 2.0, even though I explicitly asked in my email if they were.

So I take that meaning that: anyone using Spring Data Elastic Search has to either release ALL their source code publicly as SPP License states or have Elastic Search commercial license. Or they are violating the license and can expect to be sued by Elastic.co.

I do not mean to be argumentative, I see this as a serious concern. These vague and roundabout statements about "HLRC client and the new Java API client" (on Elastic website) do not mean anything, specially to judge ruling over the case when anyone using Spring Data Elastic Search in their proprietary product is getting sued by anyone (competitor) wanting their proprietary source code, unless there is existing commercial Elastic license to protect it under this dual license (which I suppose is the whole point of Elastic co).

These transitive dependencies of Spring Data Elastic Search are explicitly and clearly documented individually to be dual licensed under SPP and ES 2.0. So it needs to equally explicitly and clearly stated by Elastic co that these specific libraries and version are available under Apache license 2.0 from specific date.

mikezerosix avatar Jun 29 '22 16:06 mikezerosix

@mikezerosix instead of spreading FUD and quoting a director for corporate and business development that may not be well versed in licensing details, you should read the section on HLRC in the licensing FAQ: https://www.elastic.co/pricing/faq/licensing#im-using-elasticsearch-via-apis-how-does-this-change-affect-me

The Java HLRC has dependencies on the core of Elasticsearch, and as a result this client library will be licensed under the Elastic License. Over time, we will eliminate this dependency and move the Java HLRC to be licensed under Apache 2.0. Until that time, for the avoidance of doubt, we do not consider using the Java HLRC as a client library in development of an application or library used to access Elasticsearch to constitute a derivative work under the Elastic License, and this will not have any impact on how you license the source code of your application using this client library or how you distribute it.

This is very clear and unambiguous: Spring Data Elasticsearch is not tainted by the SSPL. And neither is any application using HLRC, either directly or indirectly through Spring Data Elasticsearch.

swallez avatar Jun 29 '22 16:06 swallez

@swallez To me those Elastic co postings on their web site about LHRC are vague and ambiguous. As at least I do not know how, if at at all, those apply to this Spring Data Elastic software.

The IP owner, Elastic co is very clear and unambiguous and legally binding way in saying, within the license information of those published packages, that the transitive dependencies in Spring data ElasticSearch, below are dual licensed with SPP and Elastic License 2.0.:

-server (org.elasticsearch:elasticsearch:7.15.2 - https://github.com/elastic/elasticsearch) -elasticsearch-cli (org.elasticsearch:elasticsearch-cli:7.15.2 - https://github.com/elastic/elasticsearch) -elasticsearch-core (org.elasticsearch:elasticsearch-core:7.15.2 - https://github.com/elastic/elasticsearch) -elasticsearch-geo (org.elasticsearch:elasticsearch-geo:7.15.2 - https://github.com/elastic/elasticsearch) -elasticsearch-plugin-classloader (org.elasticsearch:elasticsearch-plugin-classloader:7.15.2 - https://github.com/elastic/elasticsearch) -elasticsearch-secure-sm (org.elasticsearch:elasticsearch-secure-sm:7.15.2 - https://github.com/elastic/elasticsearch) -elasticsearch-x-content (org.elasticsearch:elasticsearch-x-content:7.15.2 - https://github.com/elastic/elasticsearch) -rest-high-level (org.elasticsearch.client:elasticsearch-rest-high-level-client:7.15.2 - https://github.com/elastic/elasticsearch) -aggs-matrix-stats (org.elasticsearch.plugin:aggs-matrix-stats-client:7.15.2 - https://github.com/elastic/elasticsearch) -lang-mustache (org.elasticsearch.plugin:lang-mustache-client:7.15.2 - https://github.com/elastic/elasticsearch) -mapper-extras (org.elasticsearch.plugin:mapper-extras-client:7.15.2 - https://github.com/elastic/elasticsearch) -parent-join (org.elasticsearch.plugin:parent-join-client:7.15.2 - https://github.com/elastic/elasticsearch) -rank-eval (org.elasticsearch.plugin:rank-eval-client:7.15.2 - https://github.com/elastic/elasticsearch)

Which they also very clear and unambiguous way stated in email to me from Elastic co licensing to my question if above software was usable under Apache 2.0 license when used as part of Spring Data ElasticSearch pacage in 3rd party software. Official answer was that "those are dual licensed with SPP and ES2.0 license" and not a single word saying anything different.

To me that is very clear and unambiguous, not to mention legally binding.

I am not going to argue about this any more. I see this as serious problem I wanted to warn about. Everyone can make their own decisions.

mikezerosix avatar Jun 30 '22 08:06 mikezerosix

@mikezerosix it seems to me your request to Elastic was not handled as it should have been, since the FAQ on HLRC was not even mentioned. I work at Elastic, and will check with the legal/licensing team to see how we can make it more clear and include all transitive dependencies.

Also note that you can override dependencies in your project to use version 7.10.2 which is entirely Apache 2 licensed. It will not have the Elasticsearch APIs that were added later, but will work fine for most of the APIs.

swallez avatar Jun 30 '22 09:06 swallez

Also note that you can override dependencies in your project to use version 7.10.2

This will not work, as the xcontent packages were moved in 7.15 (or 7.17), so Spring Data Elasticsearch will need them in the new places and not where they were in 7.10

sothawo avatar Jun 30 '22 15:06 sothawo

Yesterday we had a call - @brijos (AWS/Opensearch), @mp911de and Ilayaperumal Gopinathan (both VMware/Spring Data) @reta and me - about how and where the integration of Opensearch into Spring Data Elasticsearch will be done. The result of this call:

  • There will be a new artifact spring-data-opensearch.
  • This will not be a modified clone of the existing Spring Data Elasticsearch library, but will be built on top of that, providing the integration of the Opensearch client into Spring Data Elasticsearch.
  • The repository for that new library will be set up and maintained by Opensearch and the community
  • The code of the PR from @reta will be the initial code for this integration

The issue for this setup in Opensearch: https://github.com/opensearch-project/opensearch-clients/issues/28.

This setup allows changes in Spring Data Elasticsearch to be available for users of Opensearch as well as long as no client modifications are necessary. Once this setup is done, the documentation in Spring Data Elasticsearch will be adapted to inform the users about the new possibility to integrate Opensearch.

sothawo avatar Jul 13 '22 05:07 sothawo

just for those who had missed it (like me): the opensearch-project/spring-data-opensearch repo now exists and work there has now begun with @reta's contribution 🥳

rursprung avatar Oct 11 '22 09:10 rursprung

thanks for bringing this up in this issue. I will close this one now.

sothawo avatar Oct 12 '22 10:10 sothawo