crawlers
crawlers copied to clipboard
Indexing custom fields from a sitemap
Hi,
We have created a sitemap with custom fields. We would like to understand how to have the custom fields being added in the indexed documents.
Thank you
I cannot think of an out-of-the-box way to do so with custom fields, but I can think of a workaround if you know your Java.
You could crawl your sitemap as a start URL, and write your own ILinkExtractor
. The link extractor produces links with some predefined metadata fields that will be stored with the document. You could highjack one of those fields to store your own metadata in it (e.g. the "text" attribute of the produced Link
objects. Then you would use one of the manipulation options in the Importer module to split your custom metadata values into their own fields.
Not the most straightforward for sure. We can make this a feature request too.
Thank you, Pascal for your response. A new feature request would be a better solution. I was wondering if you could say how much time would take to have the new feature.
+1 to this feature request.