alfresco-indexer icon indicating copy to clipboard operation
alfresco-indexer copied to clipboard

Text Extracting

Open maoo opened this issue 11 years ago • 7 comments

Hi,

ManifoldCF use extract update handler to handle binary content. Binary content is sent to solr, and tikka try to extract text content and some metadata (mime type).

For alfresco connector, Alfresco should be used to convert binary to text as official solr do (by calling NodeContentGet). Because alfresco already know how to convert document to text.

But NodeContentGet webscript is protected by Certificat, you have to clone this webscript.

(original issue - https://github.com/maoo/alfresco-webscript-manifold-connector/issues/21 by @alexist )

maoo avatar Oct 02 '14 09:10 maoo

The Manifold Alfresco connector could invoke NodeContentGet (with http or https, both are available) during the manifold processDocument; this would imply:

  • Adding the right logic into alfresco-indexer-client
  • Invoke alfresco-indexer-client from Alfresco Manifold Connector

maoo avatar Oct 02 '14 09:10 maoo

But NodeContentGet is protected by solr-specific authentication mechanism (certificat). Is there another way to call this webscript in HTTP / without certificat ?

alexist avatar Oct 02 '14 09:10 alexist

You can run without SSL - https://wiki.alfresco.com/wiki/Alfresco_And_SOLR#Running_Without_SSL

maoo avatar Oct 02 '14 09:10 maoo

When SSL is disabled, Solr webscript are accessible without any authentication. Not sure it's good idea, and you need to protect another way these webscripts. Futhermore, you have to patch web.xml in order to disable SSL, also not a good idea.

I think exposing this webscript with the standard authentication mechanism can solve theses problem.

alexist avatar Oct 02 '14 10:10 alexist

The all-in-one archetype is configured to use http (nossl) for Alfresco-Solr comms (in both directions)

https://artifacts.alfresco.com/nexus/content/repositories/alfresco-docs/alfresco-lifecycle-aggregator/latest/archetypes/alfresco-allinone-archetype/usage.html

maoo avatar Oct 02 '14 10:10 maoo

the maven SDK disable SSL during development phase, not in production environment ...

alexist avatar Oct 02 '14 11:10 alexist

True, but it shows how you need to patch the Alfresco web.xml in order to disable SSL

maoo avatar Oct 02 '14 12:10 maoo