importer icon indicating copy to clipboard operation
importer copied to clipboard

Support for timezone conversions in DateFormatTagger?

Open ronjakoi opened this issue 7 years ago • 2 comments

I have to crawl an intranet site that provides the last modified timestamps of articles in a meta tag like this: <meta name="LASTMODIFIED" content="19.02.2018 12:40">

This is easily handled by DateFormatTagger. However, there is a problem with timezones: the intranet provides the time in local time, while Solr expects it in UTC.

Can you please add support for timezone conversions in DateFormatTagger? In the meantime, is there a workaround for my problem, other than using ScriptTagger to manipulate the date after DateFormatTagger?

ronjakoi avatar Feb 19 '18 12:02 ronjakoi

Good suggestion. I am making this a feature request.

In the meantime, here are a few workarounds you can try:

  • Modify the launch script to add the following argument to the java command executed (or change GMT with another timezone):
java -Duser.timezone=GMT
  • Modify the launch script to set the timezone environment variable. On Linux it could look like this:
export TZ=UTC
# or export TZ=UTC+4:00 (or whatever difference)

essiembre avatar Feb 20 '18 07:02 essiembre

I did this:

<!-- meta-lastmod -->
<tagger class="com.norconex.importer.handler.tagger.impl.DateFormatTagger"
    fromField="LASTMODIFIED"
    toField="meta-lastmod"
    toFormat="yyyy-MM-dd'T'HH:mm:ss" >
    <fromFormat>dd.MM.yyyy HH:mm</fromFormat>
</tagger>

<!-- meta-published -->
<tagger class="com.norconex.importer.handler.tagger.impl.DateFormatTagger"
    fromField="PUBLISHED"
    toField="meta-published"
    toFormat="yyyy-MM-dd'T'HH:mm:ss" >
    <fromFormat>dd.MM.yyyy HH:mm</fromFormat>
</tagger>

<tagger class="com.norconex.importer.handler.tagger.impl.ScriptTagger">
    <script><![CDATA[
        var date_fields = ['meta-lastmod', 'meta-published'];
        date_fields.forEach(function(df) {
            if(metadata[df]) {
                var d = new Date(metadata[df][0]);
                // Date.toISOString() always returns UTC time
                metadata.setString(df, d.toISOString());
            }
        });
    ]]></script>
</tagger>

ronjakoi avatar Feb 20 '18 12:02 ronjakoi