Elasticsearch feature :sparkles::boom: (Lombiq Technologies: OSOE-83)
Fixes #4316
How to use:
Install Elasticsearch 7.x with Docker compose
Elasticsearch v7.17.5 Docker Compose file :
- Copy this file in a folder named Docker somewhere safe.
- Rename this file extension as .yml instead of .txt.
- Open up a Terminal or Command Shell in this folder.
- Execute
docker-compose upto deploy Elasticsearch containers.
Advice: don't remove this file from its folder if you want to remove all their containers at once later on in Docker desktop.
You should get this result in Docker Desktop app :

Set up Elasticsearch in Orchard Core
- Add Elastic Connection in the shell configuration (OrchardCore.Cms.Web appsettings.json file)
"OrchardCore_Elastic": {
"ConnectionType": "SingleNodeConnectionPool",
"Url": "http://localhost",
"Ports": "9200"
}
- Start an Orchard Core instance with VS Code debugger
- Go to Orchard Core features, Enable Elasticsearch.
Implementation details
Analyzed and Stored Properties are not very meaningful in context of Elasticsearch.
Analyzed
Analyzed is default for strings in Elastic Search. By default all string fields are stored twice in elastic as "analyzed" and stored as "text" field type of elastic and again stored as is as in a "keyword" field type of elastic.
So we will have a field called ContentItemId(Text) analyzed and another called ContentItemId.Keyword(as is as) to match on exact values using TermQuery for fields like ContentItemId or emails (Elastic Stores text fields in 2 fields analyzed vs not analyzed, a field ContentItemId.Keyword is created automatically)
Elasticsearch documentation: https://www.elastic.co/blog/strings-are-dead-long-live-strings https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-index-search-time.html
Stored
Stored is really an overhead and only required if we are processing thousands of large documents. By default Elastic will store the entire document into a field called _source and retrieves them when asked them from Index itself.
Elasticsearch documentation: https://www.elastic.co/guide/en/elasticsearch/reference/7.12/search-fields.html
DSL Query Syntax
It is suggested to always use MatchQuery instead TermQuery for text fields in Elastic, where fully confident use (.Keyword) fields for exact match with TermQuery. (e.g. matching id, or fields like email, phone number, hostname)
Elasticsearch documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
NEST CheatSheet:
https://github.com/mjebrahimi/Elasticsearch-NEST-CheatSheet-Tutorials/blob/master/README.md
TODO
- [x] Refactor Search form to support both Lucene and Elastic. :construction:
- [x] Restructure project names using OrchardCore.Search.X :recycle: :truck:
- [x] Add Advance Support for multilingual search. (Equivalent of Lucene Analyzers) :sparkles:
- [x] Refactor UI to hide Analyzed as Indexing Options for Fields as it is now the default for Lucene too. :recycle: :lipstick:
- [x] Refactor ContentIndexSettings as IContentIndexSettings : See https://github.com/OrchardCMS/OrchardCore/pull/10515 (Will not do)
- [x] Move
ContentIndexSettingsdrivers to the OrchardCore.Indexing module so that they are common to every Search provider. - [x] Replace
Querytype withNest.IQueryin the Elastic Search module. We should not parse the Queries with the OC Lucene Parser.Nest.IQueryis on par with Elasticsearch Query DSL. - [x] Recipe backward compatibility refactor. (migration) "not necessary anymore"
- [x] UserName/Password configurations
- [x] Fix ElasticContentPickerResultProvider by adding a method in the ElasticIndexManager that allows to delegate the _elasticClient and thus allowing using fluent Queries.
- [x] Add Lucene "keyword" implementation.
- [x] Fix Elasticsearch document Id mapping (was not able to remove older document by Id when publishing a new ContentItem).
- [x] Add Advance Elastic Configuration. (Cluster connection strings) - Optional
- [x] Configure error handling. See : https://github.com/elastic/elasticsearch-net/issues/1793
- [x] Change how we index custom fields like "Normalized, Sanitized, Inherited" by using an underscore instead of a dot as a separator else Elasticsearch fails to bind the data because of already used Mapping.
- [x] Add an index setting for allowing to store "_source" on the index for Elasticsearch. Defaults to true.
- [x] Using Elasticsearch deserializer to parse JSON queries
- [x] Make Analyzers configuration work on each index.
- [x] Background Task. Indexing state (Last Task Id) persisted in each Elasticsearch index.
- [x] Secure ElasticIndexManager methods to operate only on current tenant indices.
- [x] Module documentation. :memo:
Migration
Manual migration to apply to get back Lucene Indices Settings and Queries.
UPDATE Document SET Content = REPLACE(content, '"$type":"OrchardCore.Lucene.LuceneQuery, OrchardCore.Lucene"', '"$type":"OrchardCore.Search.Lucene.LuceneQuery, OrchardCore.Search.Lucene"')
WHERE [Type] = 'OrchardCore.Queries.Services.QueriesDocument, OrchardCore.Queries'
UPDATE Document SET [Type] = 'OrchardCore.Search.Lucene.Model.LuceneIndexSettingsDocument, OrchardCore.Search.Lucene'
WHERE [Type] = 'OrchardCore.Lucene.Model.LuceneIndexSettingsDocument, OrchardCore.Lucene'
Breaking Changes
IndexingConstants changes :
DisplayTextAnalyzedKey = "Content.ContentItem.DisplayText_Analyzed" instead of "Content.ContentItem.DisplayText.Analyzed"
DisplayTextNormalizedKey = "Content.ContentItem.DisplayText_Normalized" instead of "Content.ContentItem.DisplayText.Normalized"
ContentType will now be searchable with a Term query by using "Content.ContentItem.ContentType.keyword"
Taxonomies module indexing
".Inherited" = "_Inherited"
Queries migration
Elasticsearch maps automatically the data which means that Text fields will always be Analyzed. You can now access the "Stored" value of that Text field by using ".keyword" as a suffix to your field name. This means that you can now use a TermQuery on that ".keyword" field and a MatchQuery on the basic field name.
Notes
There is a new client for Elasticsearch 8.x. Nest has been renamed to Elastic.Clients.Elasticsearch.
Also, OpenSearch will not work with this PR.
There is an initiative with moving Elastic.Clients.Elasticsearch to OpenSearch here :
https://github.com/opensearch-project/opensearch-net
This means that eventually OpenSearch and Elasticsearch will need to become 2 distinct features.
So to summarize compatibility :
Nest = OpenSearch 1.x, Elasticsearch 7.x, Elasticsearch 8.x (in compatibility mode) https://github.com/elastic/elasticsearch-net/tree/7.17
Elastic.Client.Elasticsearch (replacing Nest) = Elasticsearch 8.x https://github.com/elastic/elasticsearch-net https://www.nuget.org/packages/Elastic.Clients.Elasticsearch/
OpenSearch.Client (Nuget package not released yet) = OpenSearch 2.x https://github.com/opensearch-project/opensearch-build/issues/2051
Squashed commits.
Any news on this by chance?
Ping me on Teams or else when you get time.
Is there any problem with this PR?
I'm about to start working on it. I'm still on vacation ...
found some problem:
- [x] Document total never greate than 10, https://github.com/OrchardCMS/OrchardCore/pull/11999
- [x] Need check ES Connection before oprate es index . Now, When create elastic index before es service start , the operation will success , but it not be work and can't delete or update
@piedone I'm starting working on this PR today. @hyzx86 Not sure what you mean about this :
Need check ES Connection before oprate es index . Now, When create elastic index before es service start , the operation will success , but it not be work and can't delete or update
@Skrypt
- I stop my es service
- create a es index
- After a long time, the operation will succeed
But in fact, it is a wrong check
For the following reasons , So I don't think we should just return a Boolean value here. Maybe we can throw an exception or directly return ExistsResponse

Should we check the link status before each operation of ES?
Or throw an exception every time the API link fails
We need to prevent allowing to create indexes if we don't have a successfully working ES connection. This is more a UI thing in that case. We need to display a notification that there is no ES connection working when going to the page where we can add indexes. Also, we need to throw an exception as a safety net on these methods.
Search feature modules changes explanations with sound :
https://user-images.githubusercontent.com/3228637/180492337-011bb4f4-f433-4b6b-9d16-cc1c5bcd2c67.mp4
Awesome! Would it be possible to merge the Lucene and Elastic UX, to make Elastic a drop-in replacement for the search backend? Like how you manage Media files the same regardless they're stored locally or in Azure Blob Storage.
They are complementary. Both can be used at the same time. But of course, you could use only Elastic Search if that's what you want instead of Lucene.
Right now, there are discrepancies between both implementations. You can't use a Lucene Query and copy/paste it as an Elastic Query because everything is "analyzed" with Elastic Search. So a Term Query becomes a Match Query. But, to remove this discrepancy we would need to make the Lucene implementation work like Elastic Search which would take more time.
I will take some more time to make sure we are doing the proper thing here by setting every document as "Analyzed" with Elastic Search. But the thing is that with Elastic Search everything is also always stored as I documented in the first post.
So, right now, we can't just switch from Lucene to Elastic Search seamlessly unless we would keep the same ContentIndexSettings for both implementations. I removed the "Stored" option for Elastic because it was unnecessary.
Also, another thing to take into consideration. We parse Lucene Queries with our own QueryParser implementation which follows Lucene 4.0 standards and Nest uses Lucene 7.x. not to mention that Lucene 8.x will use a newer implementation of Nest which is getting renamed to Elastic.Clients.Elasticsearch https://github.com/elastic/elasticsearch-net
Lots of moving parts with Elastic compared with Lucene. Maybe it is better to keep Lucene and Elastic implementations unique for now.
From this, it seems to me that if these two are independent features, we'll have two increasingly diverging Lucene-based search implementations. Why I see this as an issue because while Elastic offers a lot more than vanilla Lucene, I'd also consider it a "production" version of search, since otherwise, you need to employ workarounds in a production app unless you have a single instance and that's all to your environment (like with staged publishing, blue-green deployment, multi-node hosting). Contrast this with the Lucene module which just works without any external dependencies and is thus perfect for local development and testing. Again, like local storage-Azure Blob Storage: You can use both anywhere in principle, but most possibly you want to use local storage while you write the code, and simply switch to Blob Storage in production.
I don't have any insights into how, when, or even whether it makes sense to tackle this, but wanted to share this viewpoint.
I believe that the Lucene implementation, since we created our own Query Parsers, will always be unique.
Also, you can't assume that a Query that works in Lucene.NET (4.0) will still work with Elastic Search 7.x (which uses Lucene 7.x). So while we could try to make an effort to make it work, there will always be discrepancies. If Lucene.NET was on par with Lucene (java) version then it could work but it is not the case. Right now, our Query Parsers try to follow the Elastic Search documentation examples but we did not implement every Query type yet. So going back and forth from Elastic to Lucene will cause issues too. Because of missing Query implementations in Lucene.NET 4.0 or deprecated Query implementations in Elastic.
I see it more like that: if you want to rely on Elastic Search then use a Docker container locally to develop your website if that's what you need to use eventually on production. Else, if you built Lucene Queries and want to move them to Elastic Search you will have to adapt these Queries. Most of them will be fundamentally quite the same.
I see the Lucene module as more like the basic implementation of a search provider that cost nothing more to host. That's the lightweight version of a "Search Engine" that covers most basic needs and that most people will use as the default one. It could become more eventually too by allowing us to scale it as a service like Elastic Search is, but here and now, we can't compare it with the "product" and engineering effort that is Elastic Search.
Here the difference with the Storage modules is that they don't have to deal with Query discrepancies. :smile:
But, I will look at the "Stored" option and maybe insert the concept back in ElasticSearch, even if it doesn't make sense; if that's what we want.
In any case, I'd only aim for a common subset of features, aimed at supporting the most common scenarios, between the two implementations. So, I'm not arguing for complete feature parity, Elastic will always offer more, but have e.g. what's in our docs as the lowest common denominator (even if that means introducing breaking changes to the Lucene module).
Perhaps it would be feasible to have something like a "compatibility mode" per Query in the Elastic module that uses our query parser, and which is thus compatible with Lucene module queries? And "native mode", where you just use what's built into Elastic directly.
Our current Query Parsers should work with ElasticSearch. Right now the issue is more about the fact that we index differently in Lucene than in Elastic. We don't have an option to "Store" only in Elastic because it would be redundant.
So basically, when we store only in Lucene then we need to use a "TermQuery". So, in Elastic this "TermQuery" becomes a "MatchQuery" because the same indexed field is now "Analyzed". To be able to do a "TermQuery" in Elastic you need to add the "_source" option to your Query. But that would need to be tested to make sure everything works. I believe that this is where we are at now.
https://www.elastic.co/guide/en/elasticsearch/reference/7.12/search-fields.html
If this is the only difference than it seem to me this can easily be handled, if with nothing else, then changing the query type (i.e. the query type string) before parsing for Elastic?
Ok, I think I understand why we added the "Stored" option for Lucene. Normally, you should never need to "Store" a ContentField. So, the "Stored" option has been added so that we can make any ContentField a technical value like a "GUID". You don't want to "Analyze" a "GUID" because you want to be able to use a "Term Query" with it simply. Here, analyzing a GUID doesn't make sense. The option has been added so that we can have "text enumerations" and filter by them using the TermQuery.
Here, you can read the documentation of the TermQuery that specifies that you should avoid it on "text" fields because they are all "Analyzed".
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
By default, Elasticsearch changes the values of text fields as part of [analysis] (https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis.html).
But, the difference here with Elasticsearch is that they don't allow you to "Store" a ContentField. It is "by design" that for example an Integer or a Float is "Stored" only. So, Elasticsearch decides if the ContentField will be "Analyzed" or "Stored" based on the CLR type passed.
Here, the issue is that we allowed overriding the ContentField natural indexing by "CLR types" by adding the "Stored" option on the ContentIndexSettings for Lucene. So, generally, Elastic and Lucene should behave the same when indexing per CLR type and when a ContentField is not overridden with the "Stored" option.
Now, we can't really remove the "Stored" option that we added for Lucene because people are using this option and it has been useful for some use cases. We need to replace it. What we would need to do is to do just like ElasticSearch does and to always "Store" the original value in the same index or a second one (need to understand their design first). Make sure that the "_source" option on the Lucene Query Parser work (implement it). Also, provide documentation on the breaking change if needed. So, basically, we would need to refactor the Lucene implementation to mimic what Elasticsearch does. That's would be the best approach.
But for now, yes, we can document that the "Stored" option is not available with Elasticsearch and that if someone used it on a TextField then he will need to migrate his Queries for Elasticsearch by using the MatchQuery instead. Also, if he used a TextField as a technical value then he might be better off using the "_source" option on the Query.
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html
So, yes, we could eventually refactor Lucene so that we can be able to move from one to the other more easily.
And here another one about "keyword" data type in ElasticSearch.
https://codecurated.com/blog/elasticsearch-text-vs-keyword/
So basically that would be a second option to take into consideration instead of always storing the original value as ElasticSearch does. In fact, always storing the original value is an option in ElasticSearch.
Here is what Drupal does :
https://www.drupal.org/node/655724
| Drupal | Orchard Core Lucene | ElasticSearch |
|---|---|---|
| UnStored | Analyzed | Analyzed naturally by CLR Type passed |
| Keyword | Stored | https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html |
| UnIndexed | Not indexed | Not indexed |
| Text | Analyzed as a TextField and not a StringField, Stored as _source ??? Our Text indexation uses a TextField . | Analyzed but stored in _source |
| Binary | Stored as _binary ??? Not implemented | https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html |
So perhaps venture into refactoring Lucene once this PR is merged?
Elasticsearch allows manual mapping of CLR types :
https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/fluent-mapping.html#auto-map-with-overrides
Ok, more on all of this. Right now with ElasticSearch we do not use explicit mapping.
But here is a detail I did not know of: Since ElasticSearch 5.0 the string type was replaced by text and keyword types.
Since then when you do not specify explicit mapping, for a simple document with string:
{
"some_field": "string value"
}
below dynamic mapping will be created:
{
"some_field": {
"type" "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
As a consequence, it will both be possible to perform full-text search on some_field, and keyword search and aggregations using the some_field.keyword field.
So, that means that if we want to be able to switch from Lucene to ElasticSearch seamlessly we would just need to remove the "Stored" option on the ContentIndexSettings and always "Store" the values in a nested ContentField.keyword field.
So as a reaction to this information I remove the IContentIndexSettings from the PR implementation and moved the drivers to the OrchardCore.Indexing module so that they be common to every SearchProviders.
Now, I believe that we can enforce type mapping in ElasticSearch but that can be done also directly on the server if needed. And we do not Store every field in Lucene because we wanted to keep the indexes smaller but I think that would be a better thing to have a nested "keyword" field with a maximum of 256 chars like ES does because it is its default behavior.
I've just seen your edits to your last comment. Sounds awesome!
Just tested this with the ES cloud service with a CloudId and it works.
Fantastic!
Also noticed from testing with a database that has around 30k content items that Elasticsearch is really fast at rebuilding the indices. Way faster than Lucene; maybe there is still room for improvement with the Lucene implementation.
I've been thinking about ways to make the ISearchProviders switchable seemlessly. Right now, the Lucene and Elastic Queries are bound to an Index at all times. We could have a method that calls a "Named Query" with an index name and it could work if the indices would be named the same in Elastic and Lucene but else this is a really "loose" coupling that can lead also to more confusion because index names are set by users.
That's why for now I kept it as 2 different types of Queries : "Lucene" and "Elastic". Also, allowing to have both Search providers to be enabled at the same time allows migrating progressively Queries from one provider to another.
However, the Search module provides a configuration to select a search provider, so this way it can be switched seamlessly. I think that it could also be set by default to a prioritized search provider by configuration if needed per module.
Could you please create a new issue for us to track the migration story, and thus address necessary improvements later?