crawlers icon indicating copy to clipboard operation
crawlers copied to clipboard

Indexing custom fields from a sitemap

Open coprisanu opened this issue 6 years ago • 3 comments

Hi,

We have created a sitemap with custom fields. We would like to understand how to have the custom fields being added in the indexed documents.

Thank you

coprisanu avatar Feb 08 '19 15:02 coprisanu

I cannot think of an out-of-the-box way to do so with custom fields, but I can think of a workaround if you know your Java.

You could crawl your sitemap as a start URL, and write your own ILinkExtractor. The link extractor produces links with some predefined metadata fields that will be stored with the document. You could highjack one of those fields to store your own metadata in it (e.g. the "text" attribute of the produced Link objects. Then you would use one of the manipulation options in the Importer module to split your custom metadata values into their own fields.

Not the most straightforward for sure. We can make this a feature request too.

essiembre avatar Feb 09 '19 03:02 essiembre

Thank you, Pascal for your response. A new feature request would be a better solution. I was wondering if you could say how much time would take to have the new feature.

coprisanu avatar Feb 11 '19 15:02 coprisanu

+1 to this feature request.

masterbee avatar Feb 11 '19 16:02 masterbee