crawlers icon indicating copy to clipboard operation
crawlers copied to clipboard

Indexing value of a new metatag

Open sudeshna-majumder opened this issue 2 years ago • 3 comments

Hello,

I have added few new metatags to my page. Ex.
<meta name="content-region" content="Global" />

I want to extract the value of 'content-region' field and index as 'region'. I am committing to Google Cloud search. If I use below pre-purse Handler I am expecting it would store the value 'Global' in 'region' field. But DebugTagger says it stores <null> every time.

<tagger class="com.norconex.importer.handler.tagger.impl.CopyTagger"> <copy fromField="content-region" toField="region" overwrite="true" ></tagger>

Am I missing anything basic here ?

sudeshna-majumder avatar Jun 21 '22 20:06 sudeshna-majumder

I suspect it is related to having it defined as part of the pre-parse handlers. Fields extracted from a file content are created when the document is parsed. So before parsing you would not have the meta field. Try moving your logic as a post-parse handler.

If that does not work for you, please share the version you are using and your full config in order for us to reproduce it.

essiembre avatar Jul 03 '22 01:07 essiembre

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 01 '22 05:09 stale[bot]

Thanks Pascal. Moving the logic to post-parse handler, I am able to extract values from new metatags. But those are not being committed to my goggle-cloud-search committer. In the below support document I don't see any configuration possibility with norconex to commit additional metatags to cloud-search-committer. https://developers.google.com/cloud-search/docs/guides/norconex-http-connector#configure-gcs Do you know about any possible way to commit them to google-cloud-search ?

sudeshna-majumder avatar Sep 02 '22 08:09 sudeshna-majumder

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Nov 01 '22 09:11 stale[bot]