UmbracoAzureSearch
UmbracoAzureSearch copied to clipboard
Using TypedMedia fallback to Azure Search
Is there a solution available yet for the fallback to Azure Search when using TypedMedia?
I use TypedMedia quite a lot in my website, I can see that someone has started writing some code in the "DummyUmbracoExamineSearcher.cs" file but this has been commented out, was there a problem trying to find a solution which meant that it was left or is it just that nobody has had time to get around to finishing this part yet?
From what I can tell the first problem would be converting the Examine "searchParams" to something you can use to query Azure Search (it looks like someone has started to attempt this) and the second problem would be converting whatever result you got from the Azure Search to "ISearchResults" which is an interface used by Examine?
This is a problem for my site because it means the first time a page loads with lots of images it is slow as it has to query the database for each image to load it into memory. Part of the reason for using Azure Search and other changes I am making is to cut down the amount of database calls and improve the speed of the website in general.
We did at one point experiment with a drop in replacement for Examine, but it was quite a task and we never got there. If your Examine indexes are built - you shouldn't see a database query for typed media... It should use the Examine internal index by default.
Ordinarily - we just have the Internal index in our solution as it is used by the back office, we just remove the External index and any other custom indexes.
If you also need to have the internal Examine index built as well for this to work then what is the point of using Azure Search in the first place? I thought the idea was to remove the need for Examine indexes that need to be built when the website starts.
I would imagine that adding the Azure Search index, will if anything slow down the website because on "Save and publish" it has to now write to two indexes (Examine and Azure) and I would also imagine queries to the Azure Search index are likely to be slower (due to latency) than querying Examine indexes. Also, there is the additional cost of using Azure (we have a very large site so it would not be covered by the free tier and would possibly need multiple replicas to achieve the SLA).
The plan was to use Azure Search index and also implement an archiving solution whereby unpublished articles were still visible on the website (but not in the XML cache) but I am wondering now if there is any point using Azure Search because I am not sure what benefit this would give over Examine especially if you also need the Examine index built as well?
The current implementation can replace back office search - so you can have a dummy internal indexer if you want, but as you point out, you need a strategy for typed media so it doesn't query the database on each request.
Assuming you use Azure search in the backoffice and replace the internal indexer with a dummy (because umbraco needs it) - then it shouldn't be that difficult to write an alternative to TypedMedia.... but we didn't get that far.
You can use the package to search media, but in your TypedMedia replacement, you'd probably want some kind of pass through cache mechanism as querying the search index multiple times per page isn't going to be much more efficient than querying the database - ideally there would be a way of querying all content and media required for a page in one hit, but that is quite complicated unless you are doing something clever and have a custom controller that can determine all of the content and media required by a page.
I'm not sure if you have written the part to not cache content yet - but we documented that here, it may be of interest. https://moriyama.co.uk/about-us/news/blog-the-need-for-archived-content-in-umbraco-and-how-to-do-it/
Thank you for the link, it is actually that article that I read that lead me down this path to use Azure Search and setting up an archiving solution (the archiving solution hasn't been started yet).
However, without writing a solution for the TypedMedia problem, I am wondering if using Azure Search is the correct approach because the archiving solution could work with Examine. Especially if I am going to potentially need both the Examine index and the Azure Search index to be built.
Suppose that I do come up with a solution for the TypedMedia problem and based on the fact that we do not have any load balancing setup (the website runs on a single Azure virtual server, the database on another Azure virtual server), therefore, in our scenario is there actually any benefit to using Azure Search over Examine, although the Azure index would be kept externally from the virtual server, the virtual server still has to process the data when initially building the index or when updating the index (i.e. saving a node in Umbraco) so I can't see that this would reduce any load on the server or database? Is it faster to build the Azure Search index than the Examine index? Or faster to query from it? Or is there something else that makes it more beneficial to use (potentially that it indexes collections such as tags, although this can technically be achieved with Examine event handlers)?
If there are potential benefits in my scenario of using Azure over Examine then maybe one workaround for the TypedMedia problem could be an Examine index that only indexes media (not sure how this would work yet), this would result in a much smaller Examine index and mean the content was handled in Azure index but media was in Examine still so that TypedContent would work as normal? But again this would only be beneficial if there are actually reasons why using Azure Search would be a preferred solution for me.
Thanks again for the advice, hopefully, you are able to answer some of these questions.
Our main reasons for developing this were:
- Azure search is more feature rich - e.g. Facets and other things that you don't get out of the box with Lucene.
- In a load balanced environment, when Umbraco starts on a new instance, there is an overhead in building Examine indexes - resulting in slow startup for sites with lots of content (with Azure search, there is no startup overhead)
- We did suffer occasional data loss or corruption in Examine indexes, Azure search seems more durable.
It would be easy enough to have an Examine index, that only contains media, you can filter what is indexed by "Doucment Type".
Obviously - this wasn't designed as a one size fits all for every scenario, and you'll know best if the above benefits fit your project well!
Thanks Darren, I am going to use the Azure search because it gives us the flexibility should we in the future use load balancing, also we have also had occasional data loss and corruption with Examine indexes as well, plus the indexing of collections without any event handlers is also useful.
I will see if I can find a solution for the TypedMedia problem and if not will have to fallback to having an Examine index that just contains media, or potentially finding a way of replacing "TypedMedia" calls in my website.