atom
atom copied to clipboard
Solr support in AtoM
Work in progress branch for adding support for Solr for searching within AtoM
Completed:
- Docker configuration that starts Solr and Zookeeper (Solr uses this for coordinating and syncing between multiple Solr nodes when run in the cloud mode) containers.
- A Solr plugin (arSolrPlugin) which serves as the Solr equivalent of arElasticSearchPlugin. It talks to Solr and has functions that allow indexing and searching.
- A solr:populate task (arSolrPopulateTask) which indexes AtoM data into Solr. The indexed data can be seen at the Solr dashboard at
http://localhost:8983/solr
. The solr dashboard also allows searching the indexed data. - A set of classes that act as the equivalent of Elastica within AtoM. These are located in the
arSolrPlugin/lib/client
folder. The query classes essentially set up query parameters for API requests to Solr,arSolrClient
accepts configuration which would allow it to communicate with Solr, and has methods which allow sending different API requests to Solr.
Work in progress:
-
arSolrSearchTask
is CLI task allows searching the solr index for a few query types. Since queries can get fairly complicated, especially with Boolean queries, this was meant for quick cli testing until Solr was officially supported by the AtoM interface, an so it isn't very customizable. However this could potentially be useful for writing tests in the future. - Unit tests for several solr query have been added. Solr's Boolean Query, Result and Result Set, and the Solr Client currently do not have any tests written for them.
TODO
Within arSolrPlugin
High priority (essential for browse or search actions):
- [ ] Add a class for handling nested search: Currently there is no class for handling nested search in the query classes we have for Solr. Solr doesn't have a built in nested query like ElasticSearch does since it doesn't treat nested fields in a special way. This means that while it could be possible to perform those searches using a simple boolean query that targets those nested fields, we would need to ensure we'e matching results within the same nested unit (for instance, we would need to ensure when searching for date ranges that we don't mix one start date with an end date from a different event for the same information object).
- [ ] Add authentication to Solr Client (
arSolrPlugin
): Currently username and password are ignored as the current solr setup doesn't set those up either. - [ ] Change getDateRangeQuery's Nested Query call (
arSolrPluginQuery
): Since there is no nested query class for solr yet, this will need to be updated once that functionality is in place.
Medium priority (not essential for basic search but still important):
- [ ] updateByQuery method/function (
arSolrPlugin.class
): This class will need a method to handle updating specific documents by query. - [ ] Create Diacritics analyzer (
arSolrPlugin.class
) - [ ] Create Brazilian Portuguese analyzer (
arSolrPlugin.class
): Solr doesn't have a default pt_BR analyzer but has specific filter classes we can use. - [ ] Ensure pdfs are also indexed by solr (
arSolrPlugin.class
): Will need to use Apache Tika to work with external docs.
Low priority (used by CLI tasks or other non search specific actions within AtoM):
- [ ] getScrolledSearchResultIdentifiers uses
Elastica\Scroll
(arSolrPluginUtil
) : This doesn't have a solr equivalent and will need to be handled. - [ ] Search, MultiSearch (see
apps/qubit/modules/search/actions/autocompleteAction.class.php
) - [ ] Bulk (for bulk document updates)
- [ ] AbstractScript (see
lib/job/arUpdatePublicationStatusJob.class.php
, https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting.html)
Lowest priority (good to have features):
- [ ] Add support for Solr server mode: Currently the docker config as well as a couple of collection based things assume that it will only be run in cloud mode. (Cloud mode uses multiple Solr nodes which is most similar to how ElasticSearch usually be configured with AtoM, Server mode has a single node, uses some slightly different API end points for a few requests, and doesn't need zookeeper)
Outside arSolrPlugin
-
AtoM extensively references Elastica, and the arElasticSearchPlugin is also deeply integrated into it. As of now, this is a list of all of the places outside the plugin itself that would need updates:
-
[ ]
apps/qubit/modules/digitalobject/actions/imageflowComponent.class.php
uses arElasticSearchPluginQuery, QubitSearch. -
[ ]
apps/qubit/modules/clipboard/actions/viewAction.class.php
uses Elastica ResultSet, Response, Query, QueryTerms, QubitSearchPager, arElasticSearchPluginConfiguration. -
[ ]
apps/qubit/modules/default/actions/moveAction.class.php
uses Elastica Query, BoolQuery, QueryTerm, QubitSearchPager, arElasticSearchPluginUtil, arElasticSearchPluginConfiguration. -
[ ]
apps/qubit/modules/default/actions/fullTreeViewAction.class.php
uses Elastica QueryTerm, Elastica ResultSet (as arguments to methods), has several method names which reference ElasticSearch, arElasticSearchPluginQuery. -
[ ]
apps/qubit/modules/default/actions/browseAction.class.php
uses arElasticSearchPluginQuery, arElasticSearchPluginConfiguration, QubitSearch. -
👆🏼 NOTE: replace L#134-L#147 (the section that essentially removes must clauses for i18n.languages queries) with a call to the
removeMustWithTermField
method inarSolrBoolQuery
-
[ ]
apps/qubit/modules/repository/actions/holdingsAction.class.php
uses Elastica QueryBool, QueryMatchAll, QueryTerm, Query, QubitSearch, arElasticSearchPluginConfiguration. -
[ ]
apps/qubit/modules/repository/actions/browseAction.class.php
uses Elastica QueryMatchAll, Query, QueryTerm, arElasticSearchPluginUtil, QubitSearch. -
[ ]
apps/qubit/modules/repository/actions/maintainedActorsAction.class.php
uses Elastica Query, QueryTerm, QubitSearch, QubitSearchPager, arElasticSearchPluginConfiguration. -
[ ]
apps/qubit/modules/taxonomy/actions/indexAction.class.php
uses Elastica Query, BoolQuery, QueryTerm, arElasticSearchPluginUtil, arElasticSearchPluginConfiguration, QubitSearch, QubitSearchPager. -
[ ]
apps/qubit/modules/actor/actions/browseAction.class.php
uses Elastica BoolQuery, QueryTerm, QueryExists, NestedQuery, arElasticSearchPluginUtil, QubitSearch, QubitSearchPager. -
[ ]
apps/qubit/modules/actor/actions/relatedInformationObjectsAction.class.php
uses Elastica Query, BoolQuery, QueryTerm, NestedQuery, QubitSearchPager, QubitSearch, arElasticSearchPluginConfiguration. -
[ ]
apps/qubit/modules/search/actions/errorAction.class.php
uses Elastica Exception, references ElasticSearch in error message. -
[ ]
apps/qubit/modules/search/actions/indexAction.class.php
uses Elastica QueryTerm, QubitSearch, arElasticSearchPluginUtil. -
[ ]
apps/qubit/modules/search/actions/autocompleteAction.class.php
uses Elastica Search, MultiSearch, Query, BoolQuery, Match, Term, QubitSearch. -
[ ]
apps/qubit/modules/search/actions/descriptionUpdatesAction.class.php
uses Elastica Query, BoolQuery, QueryTerm, QueryRange, QubitSearch, QubitSearchPager, arElasticSearchPluginConfiguration. -
[ ]
apps/qubit/modules/term/actions/navigateRelatedComponent.class.php
uses Elastica QueryTerm, QubitSearch, arElasticSearchPluginQuery. -
[ ]
apps/qubit/modules/term/actions/indexAction.class.php
uses Elastica QueryTerms, Query, BoolQuery, QueryTerm, QubitSearch, QubitSearchPager. -
[ ]
apps/qubit/modules/informationobject/actions/inventoryAction.class.php
uses Elastica BoolQuery, Query, QueryTerm, QueryTerms, QubitSearch, QubitSearchPager, arElasticSearchPluginConfiguration. -
[ ]
apps/qubit/modules/informationobject/actions/autocompleteAction.class.php
uses Elastica Query, BoolQuery, MatchAll, QueryTerm, arElasticSearchPluginUtil, QubitSearch, QubitSearchPager. -
[ ]
lib/filter/QubitMeta.class.php
references Elastica Exception. -
[ ]
lib/QubitLftSyncer.class.php
uses Elastica Bulk, QueryTerm, Document, QubitSearch, arElasticSearchPluginQuery. -
[ ]
lib/search/QubitSearchPager.class.php
uses Elastica ResultSet. -
[ ]
lib/helper/QubitHelper.php
references Elastica Result. -
[ ]
lib/job/arUpdateEsActorRelationsJob.class.php
references Elastica exception, QubitSearch, arElasticSearchActorPdo. -
[ ]
lib/job/arActorExportJob.class.php
uses Elastica QueryTerms, arElasticSearchPluginUtil, QubitSearch. -
[ ]
lib/job/arRepositoryCsvExportJob.class.php
uses Elastica QueryTerms, arElasticSearchPluginQuery, arElasticSearchPluginUtil, QubitSearch. -
[ ]
lib/job/arUpdatePublicationStatusJob.class.php
uses Elastica AbstractScript, QueryTerm, QubitSearch. -
[ ]
lib/job/arInformationObjectExportJob.class.php
uses Elastica QueryTerm, QueryTerms, arElasticSearchPluginUtil, arElasticSearchPluginQuery, QubitSearch. -
[ ]
lib/task/tools/updatePublicationStatusTask.class.php
uses Elastica AbstractScript, QueryTerm, QubitSearch. -
[ ]
lib/task/propel/propelGenerateSlugsTask.class.php
uses Elastica Query, BoolQuery, QueryTerm, QubitSearch. -
[ ]
lib/model/QubitInformationObject.php
uses Elastica BoolQuery, Query, QueryMatch, QubitSearch. -
[ ]
lib/model/QubitTerm.php
uses Elastica BoolQuery, QueryTerm, QubitSearch. -
[ ]
lib/task/search/arSearchStatusTask.class.php
uses arElasticSearchPluginConfiguration, looks for class names starting with arElasticSearch in objectsAvailableToIndex. -
[ ]
lib/task/tools/installTask.class.php
uses arElasticSearchPluginConfiguration. -
[ ]
lib/job/arUpdateEsIoDocumentsJob.class.php
uses arElasticSearchInformationObject. -
[ ]
lib/job/arUpdateEsActorRelationsJob.class.php
uses arElasticSearchActorPdo. -
[ ]
lib/job/arActorExportJob.class.php
uses arElasticSearchPluginUtil, arElasticSearchPluginQuery. -
[ ]
lib/arInstall.class.php
references arElasticSearchPlugin's search.yml and uses arElasticSearchConfigHandler. -
[ ]
lib/task/import/csvImportTask.class.php
uses arElasticSearchInformationObjectPdo, QubitSearch. -
[ ]
lib/QubitMetsParser.class.php
uses arElasticSearchPluginUtil. -
[ ]
lib/search/QubitSearch.class.php
uses arElasticSearchPlugin. -
[ ]
lib/search/QubitSearchEngine.class.php
references ElasticSearch. -
[ ]
lib/QubitFlatfileImport.class.php
references ElasticSearch. -
[ ]
lib/task/propel/propelGenerateSlugsTask.class.php
references ElasticSearch -
[ ]
config/ProjectConfiguration.class.php
sets up arElasticSearchPlugin. -
[ ]
plugins/qbAclPlugin/lib/QubitAclSearch.class.php
uses Elastica Query, BoolQuery, QueryTerm. -
[ ]
plugins/sfSkosPlugin/test/unit/importTest.php
uses Elastica Exception, QubitSearch. -
[ ]
plugins/arRestApiPlugin/lib/QubitApiAction.class.php
uses Elastica Query. -
[ ]
plugins/arRestApiPlugin/modules/api/actions/informationobjectsBrowseAction.class.php
uses arElasticSearchPluginConfiguration, arElasticSearchPluginQuery. -
[ ]
plugins/qtAccessionPlugin/modules/accession/actions/browseAction.class.php
uses Elastica Query, BoolQuery, QueryMatchAll, QubitSearch, QubitSearchPager, arElasticSearchPluginUtil, arElasticSearchPluginConfiguration. -
[ ]
test/unit/escapeTermTest.php
tests arElasticSearchPluginUtil::escapeTerm
In addition to the list above, other tasks that would need to be completed in order to switch to Solr:
- [ ] Set solr to be a default plugin that is on by default
- [ ] Update installTask to set up a config file for solr in the root config folder (similar to ES), and change the arSolPluginPluginConfiguration to point to this file
- [ ] Create a new vagrant setup for development with solr
- [ ] Update AtoM Docs: New documentation would need to be added that details installation and configuration. ElasticSearch advanced queries would also no longer work but could be replaced with documentation for solr's query syntax that would allow performing complex custom queries.