orc icon indicating copy to clipboard operation
orc copied to clipboard

Example files are using legacy timezone names (US/Pacific)

Open bdice opened this issue 1 year ago • 2 comments

The example ORC files use a timezone of US/Pacific which is no longer included in all Linux distributions. Ubuntu 24.04, for example, has moved this to a separate tzdata-legacy package. This can cause issues for ORC file readers on systems missing that legacy time zone data.

Should the example ORC files be updated to use a more current time zone name, like America/Los_Angeles?

Verifying the time zone in the stripe footers:

wget https://github.com/apache/orc/raw/refs/heads/main/examples/TestOrcFile.testDate1900.orc
orc-metadata -v TestOrcFile.testDate1900.orc
# Shows stripe footers with "timezone": "US/Pacific"

Additional context

https://bugs.launchpad.net/ubuntu/+source/tzdata/+bug/2058249 https://github.com/apache/arrow/issues/40633 https://github.com/pandas-dev/pandas/issues/56292 https://github.com/rapidsai/cudf/pull/16998#issuecomment-2400980607

bdice avatar Oct 16 '24 14:10 bdice

Thank you for reporting, @bdice .

cc @williamhyun , @wgtmac , too.

dongjoon-hyun avatar Oct 23 '24 15:10 dongjoon-hyun

To @bdice , according to our official Java tool, the type of column time is timestamp without timezone, isn't it?

$ orc-tools version
ORC 2.0.2

$ orc-tools meta ./examples/TestOrcFile.testDate1900.orc | grep Type
Processing data file examples/TestOrcFile.testDate1900.orc [length: 30941]
Type: struct<time:timestamp,date:date>

Please see here. Given that there is no timezone, I'm not sure if the root cause is the file.

  • https://orc.apache.org/docs/types.html#timestamps

ORC includes two different forms of timestamps from the SQL world:

  • Timestamp is a date and time without a time zone, which does not change based on the time zone of the reader.
  • Timestamp with local time zone is a fixed instant in time, which does change based on the time zone of the reader.

Instead, it looks like the C++ library side issue because orc-metadata is based on C++ library. BTW, ORC-1481 was fixed already at Apache ORC 2.0.0. Do you mean that you hit this issue with Apache ORC 2.0+?

  • https://github.com/apache/orc/pull/1587

dongjoon-hyun avatar Oct 23 '24 15:10 dongjoon-hyun

It looks like a breaking change of timezone name from TZDB. I will take a look. cc @ffacs

wgtmac avatar Oct 24 '24 01:10 wgtmac

Thank you so much, @wgtmac .

dongjoon-hyun avatar Oct 24 '24 02:10 dongjoon-hyun

https://bugs.launchpad.net/ubuntu/+source/tzdata/+bug/2058249 has explained the root cause that tzdata has moved timezone files like US/Pacific to a separate tzdata-legacy library without providing symlinks by intention so it is a breaking change to legacy ORC files. At the same time, some downstream projects depending on Apache ORC C++ library uses ORC files from https://github.com/apache/orc/tree/main/examples for CI validation. These CI jobs start to fail once they upgrade to Ubuntu 24.04 which uses the new version of tzdata without tzdata-legacy installed.

IMO, we should not change TestOrcFile.testDate1900.orc as it is a good example to check if tzdata-legacy is required. One thing that I don't understand is that we have CI jobs running on Ubuntu 24.4 but they do not fail.

wgtmac avatar Oct 26 '24 13:10 wgtmac

IMO, we should not change TestOrcFile.testDate1900.orc as it is a good example to check if tzdata-legacy is required.

That is fine with me! I have worked around this by installing tzdata-legacy on Ubuntu 24.04. I can see the potential value here. I am okay with closing this issue with no action, if that is acceptable to others.

Another possible course of action would be to leave TestOrcFile.testDate1900.orc as-is, and update the timezone names in TestOrcFile.testDate2038.orc (currently also using US/Pacific).

2038 test file output

Using orc 2.0.2:

$ orc-metadata -v TestOrcFile.testDate2038.orc
{ "name": "TestOrcFile.testDate2038.orc",
  "type": "struct<time:timestamp,date:date>",
  "attributes": {},
  "rows": 212000,
  "stripe count": 28,
  "format": "0.12", "writer version": "HIVE-8732", "software version": "ORC Java",
  "compression": "zlib", "compression block": 10000,
  "file length": 95787,
  "content": 94762, "stripe stats": 686, "footer": 314, "postscript": 24,
  "row index stride": 10000,
  "user metadata": {
  },
  "stripes": [
    { "stripe": 0, "rows": 15000,
      "offset": 3, "length": 6410,
      "index": 153, "data": 6194, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 3, "length": 21 },
        { "id": 1, "column": 1, "kind": "index", "offset": 24, "length": 78 },
        { "id": 2, "column": 2, "kind": "index", "offset": 102, "length": 54 },
        { "id": 3, "column": 1, "kind": "data", "offset": 156, "length": 507 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 663, "length": 5416 },
        { "id": 5, "column": 2, "kind": "data", "offset": 6079, "length": 271 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 1, "rows": 5000,
      "offset": 6413, "length": 2214,
      "index": 76, "data": 2075, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 6413, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 6425, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 6462, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 6489, "length": 171 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 6660, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 8463, "length": 101 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 2, "rows": 10000,
      "offset": 8627, "length": 4321,
      "index": 76, "data": 4182, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 8627, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 8639, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 8676, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 8703, "length": 340 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 9043, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 12651, "length": 234 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 3, "rows": 10000,
      "offset": 12948, "length": 4326,
      "index": 77, "data": 4186, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 12948, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 12960, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 12998, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 13025, "length": 341 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 13366, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 16974, "length": 237 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 4, "rows": 5000,
      "offset": 17274, "length": 2229,
      "index": 76, "data": 2090, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 17274, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 17286, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 17323, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 17350, "length": 174 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 17524, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 19327, "length": 113 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 5, "rows": 10000,
      "offset": 19503, "length": 4401,
      "index": 77, "data": 4261, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 19503, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 19515, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 19553, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 19580, "length": 416 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 19996, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 23604, "length": 237 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 6, "rows": 5000,
      "offset": 23904, "length": 2268,
      "index": 76, "data": 2129, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 23904, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 23916, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 23953, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 23980, "length": 210 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 24190, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 25993, "length": 116 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 7, "rows": 10000,
      "offset": 26172, "length": 4397,
      "index": 77, "data": 4257, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 26172, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 26184, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 26222, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 26249, "length": 419 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 26668, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 30276, "length": 230 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 8, "rows": 5000,
      "offset": 30569, "length": 2269,
      "index": 76, "data": 2130, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 30569, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 30581, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 30618, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 30645, "length": 213 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 30858, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 32661, "length": 114 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 9, "rows": 10000,
      "offset": 32838, "length": 4390,
      "index": 77, "data": 4250, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 32838, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 32850, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 32888, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 32915, "length": 411 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 33326, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 36934, "length": 231 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 10, "rows": 5000,
      "offset": 37228, "length": 2268,
      "index": 76, "data": 2129, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 37228, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 37240, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 37277, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 37304, "length": 211 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 37515, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 39318, "length": 115 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 11, "rows": 10000,
      "offset": 39496, "length": 4399,
      "index": 77, "data": 4259, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 39496, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 39508, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 39546, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 39573, "length": 414 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 39987, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 43595, "length": 237 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 12, "rows": 5000,
      "offset": 43895, "length": 2266,
      "index": 76, "data": 2127, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 43895, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 43907, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 43944, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 43971, "length": 211 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 44182, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 45985, "length": 113 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 13, "rows": 10000,
      "offset": 46161, "length": 4395,
      "index": 77, "data": 4255, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 46161, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 46173, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 46211, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 46238, "length": 412 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 46650, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 50258, "length": 235 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 14, "rows": 5000,
      "offset": 50556, "length": 2267,
      "index": 76, "data": 2128, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 50556, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 50568, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 50605, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 50632, "length": 211 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 50843, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 52646, "length": 114 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 15, "rows": 10000,
      "offset": 52823, "length": 4401,
      "index": 77, "data": 4261, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 52823, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 52835, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 52873, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 52900, "length": 414 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 53314, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 56922, "length": 239 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 16, "rows": 5000,
      "offset": 57224, "length": 2272,
      "index": 76, "data": 2133, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 57224, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 57236, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 57273, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 57300, "length": 211 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 57511, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 59314, "length": 119 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 17, "rows": 10000,
      "offset": 59496, "length": 4396,
      "index": 76, "data": 4257, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 59496, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 59508, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 59545, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 59572, "length": 414 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 59986, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 63594, "length": 235 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 18, "rows": 10000,
      "offset": 63892, "length": 4399,
      "index": 77, "data": 4259, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 63892, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 63904, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 63942, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 63969, "length": 416 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 64385, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 67993, "length": 235 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 19, "rows": 5000,
      "offset": 68291, "length": 2265,
      "index": 76, "data": 2126, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 68291, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 68303, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 68340, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 68367, "length": 210 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 68577, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 70380, "length": 113 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 20, "rows": 10000,
      "offset": 70556, "length": 4398,
      "index": 77, "data": 4258, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 70556, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 70568, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 70606, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 70633, "length": 413 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 71046, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 74654, "length": 237 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 21, "rows": 5000,
      "offset": 74954, "length": 2263,
      "index": 76, "data": 2124, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 74954, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 74966, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 75003, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 75030, "length": 206 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 75236, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 77039, "length": 115 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 22, "rows": 10000,
      "offset": 77217, "length": 4403,
      "index": 77, "data": 4263, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 77217, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 77229, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 77267, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 77294, "length": 417 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 77711, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 81319, "length": 238 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 23, "rows": 5000,
      "offset": 81620, "length": 2266,
      "index": 77, "data": 2126, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 81620, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 81632, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 81670, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 81697, "length": 207 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 81904, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 83707, "length": 116 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 24, "rows": 5000,
      "offset": 83886, "length": 2267,
      "index": 77, "data": 2127, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 83886, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 83898, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 83936, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 83963, "length": 213 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 84176, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 85979, "length": 111 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 25, "rows": 5000,
      "offset": 86153, "length": 2265,
      "index": 76, "data": 2126, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 86153, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 86165, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 86202, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 86229, "length": 211 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 86440, "length": 1803 },
        { "id": 5, "column": 2, "kind": "data", "offset": 88243, "length": 112 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 26, "rows": 10000,
      "offset": 88418, "length": 4399,
      "index": 77, "data": 4259, "footer": 63,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 88418, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 88430, "length": 38 },
        { "id": 2, "column": 2, "kind": "index", "offset": 88468, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 88495, "length": 414 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 88909, "length": 3608 },
        { "id": 5, "column": 2, "kind": "data", "offset": 92517, "length": 237 }
      ],
      "timezone": "US/Pacific"
    },
    { "stripe": 27, "rows": 2000,
      "offset": 92817, "length": 1945,
      "index": 76, "data": 1808, "footer": 61,
      "encodings": [
         { "column": 0, "encoding": "direct" },
         { "column": 1, "encoding": "direct rle2" },
         { "column": 2, "encoding": "direct rle2" }
      ],
      "streams": [
        { "id": 0, "column": 0, "kind": "index", "offset": 92817, "length": 12 },
        { "id": 1, "column": 1, "kind": "index", "offset": 92829, "length": 37 },
        { "id": 2, "column": 2, "kind": "index", "offset": 92866, "length": 27 },
        { "id": 3, "column": 1, "kind": "data", "offset": 92893, "length": 89 },
        { "id": 4, "column": 1, "kind": "secondary", "offset": 92982, "length": 1661 },
        { "id": 5, "column": 2, "kind": "data", "offset": 94643, "length": 58 }
      ],
      "timezone": "US/Pacific"
    }
  ]
}

bdice avatar Oct 28 '24 20:10 bdice

@bdice I think we can keep those files are they are created by legacy writers: "format": "0.12", "writer version": "HIVE-8732", "software version": "ORC Java". We can use the latest writer to generate new file with equivalent data but with new timezone names.

wgtmac avatar Oct 30 '24 01:10 wgtmac

Thank you all. Let me close this issue because it seems that we agree that the old files should be kept in AS-IS. Feel free to make a PR for the newly proposed file.

dongjoon-hyun avatar Dec 25 '24 20:12 dongjoon-hyun