importer
importer copied to clipboard
Support for timezone conversions in DateFormatTagger?
I have to crawl an intranet site that provides the last modified timestamps of articles in a meta tag like this: <meta name="LASTMODIFIED" content="19.02.2018 12:40">
This is easily handled by DateFormatTagger. However, there is a problem with timezones: the intranet provides the time in local time, while Solr expects it in UTC.
Can you please add support for timezone conversions in DateFormatTagger? In the meantime, is there a workaround for my problem, other than using ScriptTagger to manipulate the date after DateFormatTagger?
Good suggestion. I am making this a feature request.
In the meantime, here are a few workarounds you can try:
- Modify the launch script to add the following argument to the java command executed (or change GMT with another timezone):
java -Duser.timezone=GMT
- Modify the launch script to set the timezone environment variable. On Linux it could look like this:
export TZ=UTC
# or export TZ=UTC+4:00 (or whatever difference)
I did this:
<!-- meta-lastmod -->
<tagger class="com.norconex.importer.handler.tagger.impl.DateFormatTagger"
fromField="LASTMODIFIED"
toField="meta-lastmod"
toFormat="yyyy-MM-dd'T'HH:mm:ss" >
<fromFormat>dd.MM.yyyy HH:mm</fromFormat>
</tagger>
<!-- meta-published -->
<tagger class="com.norconex.importer.handler.tagger.impl.DateFormatTagger"
fromField="PUBLISHED"
toField="meta-published"
toFormat="yyyy-MM-dd'T'HH:mm:ss" >
<fromFormat>dd.MM.yyyy HH:mm</fromFormat>
</tagger>
<tagger class="com.norconex.importer.handler.tagger.impl.ScriptTagger">
<script><![CDATA[
var date_fields = ['meta-lastmod', 'meta-published'];
date_fields.forEach(function(df) {
if(metadata[df]) {
var d = new Date(metadata[df][0]);
// Date.toISOString() always returns UTC time
metadata.setString(df, d.toISOString());
}
});
]]></script>
</tagger>