cvelistV5 icon indicating copy to clipboard operation
cvelistV5 copied to clipboard

Inconsistent Timestamp Formats in CVE JSON Records

Open jgamblin opened this issue 2 months ago • 4 comments

Hello CVE Project Team,

I am consuming the cvelistV5 JSON data and have noticed that multiple different timestamp formats are being used across the records. This inconsistency makes reliable, programmatic parsing difficult and requires consumers to build complex parsers to handle all variations.

Analysis of Timestamp Formats

I ran an analysis script across a full clone of the repository. The script scanned all 315,569 JSON files in the cves/ directory and cataloged the string formats used for timestamp-related keys defined in the CVE 5.x schema.

Summary Statistics

  • Total files scanned: 315,569
  • Total timestamps found: 1,792,599
  • Unique formats detected: 6 different ISO 8601 variations

Detailed Findings by Field

dateUpdated (1,011,798 timestamps):

Format Example Count Percentage
YYYY-MM-DDTHH:MM:SS.ffffffZ 2023-01-16T15:00:00.123456Z 800,823 79.1%
YYYY-MM-DDTHH:MM:SS 2017-09-18T12:57:01 192,572 19.0%
YYYY-MM-DDTHH:MM:SSZ 2023-01-16T15:00:00Z 10,355 1.0%
YYYY-MM-DDTHH:MM:SS.ffffff 2023-01-16T15:00:00.123 8,048 0.8%

dateReserved (315,567 timestamps):

Format Example Count Percentage
YYYY-MM-DDTHH:MM:SS 2017-09-18T12:57:01 192,506 61.0%
YYYY-MM-DDTHH:MM:SS.ffffffZ 2023-01-16T15:00:00.123456Z 112,642 35.7%
YYYY-MM-DDTHH:MM:SSZ 2023-01-16T15:00:00Z 10,419 3.3%

datePublished (311,263 timestamps):

Format Example Count Percentage
YYYY-MM-DDTHH:MM:SS 2017-09-18T12:57:01 175,519 56.4%
YYYY-MM-DDTHH:MM:SS.ffffffZ 2023-01-16T15:00:00.123456Z 118,726 38.1%
YYYY-MM-DDTHH:MM:SSZ 2023-01-16T15:00:00Z 17,018 5.5%

dateRejected (16,076 timestamps):

Format Example Count Percentage
YYYY-MM-DDTHH:MM:SS 2017-09-18T12:57:01 12,918 80.4%
YYYY-MM-DDTHH:MM:SS.ffffffZ 2023-01-16T15:00:00.123456Z 3,158 19.6%

dateAssigned (2,820 timestamps):

Format Example Count Percentage
YYYY-MM-DDTHH:MM:SS.ffffff+HH:MM 2023-01-16T15:00:00.123+00:00 1,270 45.0%
YYYY-MM-DDTHH:MM:SS 2017-09-18T12:57:01 1,087 38.5%
YYYY-MM-DDTHH:MM:SS.ffffffZ 2023-01-16T15:00:00.123456Z 463 16.4%

datePublic (135,075 timestamps):

Format Example Count Percentage
YYYY-MM-DDTHH:MM:SS 2017-09-18T12:57:01 114,073 84.4%
YYYY-MM-DDTHH:MM:SS.ffffffZ 2023-01-16T15:00:00.123456Z 17,084 12.6%
YYYY-MM-DDTHH:MM:SS+HH:MM 2023-01-16T15:00:00+00:00 2,634 2.0%
YYYY-MM-DDTHH:MM:SS.ffffff+HH:MM 2023-01-16T15:00:00.123+00:00 1,270 0.9%
YYYY-MM-DDTHH:MM:SSZ 2023-01-16T15:00:00Z 14 <0.1%

Key Issues

  1. Timezone Inconsistency: A significant portion of timestamps (20-84% depending on field) lack timezone designators, making it ambiguous whether they represent UTC, local time, or an unspecified timezone. The CVE schema description states "If timezone offset is not given, GMT (+00:00) is assumed", but this assumption must be explicitly documented for consumers.

  2. Multiple Valid Formats: 6 different valid ISO 8601 formats are in use, requiring parsers to implement fallback logic and format detection.

  3. Fractional Seconds Inconsistency: Some timestamps include microsecond precision (.ffffff) while others don't, even within the same field.

  4. Mixed Timezone Representations: Some records use Z (Zulu/UTC), others use explicit offsets like +00:00, and many omit timezone information entirely.

Recommendation

For consistency and to improve machine-readability, I recommend standardizing on a single timestamp format as defined by RFC 3339 / ISO 8601.

Recommended format: YYYY-MM-DDTHH:MM:SS.sssZ

Example: 2025-10-26T14:30:00.123Z

This format:

  • ✅ Includes explicit UTC timezone designator (Z)
  • ✅ Provides sub-second precision (milliseconds, which can be extended to microseconds if needed)
  • ✅ Is unambiguous and widely supported
  • ✅ Complies with RFC 3339 and ISO 8601
  • ✅ Is the most common format already in use for dateUpdated (79%)

Alternative (if sub-second precision is not needed): YYYY-MM-DDTHH:MM:SSZ

Example: 2025-10-26T14:30:00Z

Impact on Consumers

Standardizing timestamp formats would:

  • Eliminate the need for complex multi-format parsing logic
  • Reduce parsing errors and timezone ambiguity
  • Improve data quality and interoperability
  • Make CVE data easier to consume programmatically
  • Align with industry best practices for timestamp representation

Additional Context

  • Analysis Tool: I created a Python script that scans the entire cvelistV5 repository and identifies timestamp format patterns. The script is available if you would like to review the methodology.
  • Schema Reference: Based on CVE Schema 5.x documentation
  • Fields Analyzed: dateUpdated, dateReserved, datePublished, dateRejected, dateAssigned, datePublic

Thank you for considering this request. Standardizing timestamp formats would greatly benefit all consumers of the cvelistV5 data.

jgamblin avatar Oct 26 '25 14:10 jgamblin

I vote for (and use) YYYY-MM-DDTHH:MM:SSZ. We don't need milliseconds.

I agree with UTC/Z but might be convinced to support local TZ as long as we specify the format.

amanion-cisa avatar Oct 29 '25 15:10 amanion-cisa

For reference, I had opened AWG#135 on Sep 17, 2024 to fix the data, and QWG#353 to fix the JSON schema.

jayjacobs avatar Nov 03 '25 19:11 jayjacobs

And in the discussions in the Working group(s) we talked about the system accepting in any date/time format and timezone, but standardize the data stored in the final record so that everything is in the same format and UTC.

jayjacobs avatar Nov 03 '25 19:11 jayjacobs

Timestamp Format Inconsistencies in Recent CVE Records

Hi CVE AWG team (@rbrittonMitre)! 👋

I ran this analysis after the CVE AWG group mentioned that anything published in the last year should have only one timestamp format. I wanted to verify this and provide some data to help identify where inconsistencies still exist.

What I Found

I analyzed 51,773 CVE JSON files published in the last 365 days (from November 12, 2024 to November 11, 2025) to check timestamp format consistency. The analysis examined all six timestamp fields defined in the CVE 5.x schema: dateUpdated, dateReserved, datePublished, dateRejected, dateAssigned, and datePublic.

The good news: Most recent CVEs (95-98%) are using the standard RFC 3339 format! 🎉

The challenge: There are still 6 different timestamp format variations being used across these fields. While the predominant format is correct, the variations mean parsers still need fallback logic to handle all cases.

Summary Statistics

  • Total CVE Files Analyzed: 51,773 (published between Nov 12, 2024 and Nov 11, 2025)
  • Total Timestamp Instances: 286,185
  • Unique Format Variations Found: 6
  • Unrecognized Formats: 0

Note: The number of timestamp instances is higher than the number of CVE files because each CVE JSON can contain multiple timestamp fields (like dateUpdated in both the main cveMetadata section and within each container object such as cna and adp). A single CVE file typically has 3-6 timestamp instances across all fields.

Detailed Findings by Field

dateUpdated (174,820 instances)

Format Example Count Percentage Example CVEs
%Y-%m-%dT%H:%M:%S.%fZ 2024-11-01T12:34:56.789012Z 170,560 97.56% CVE-2013-3307, CVE-2025-2762
%Y-%m-%dT%H:%M:%S 2024-11-01T12:34:56 3,352 1.92% CVE-2013-6488, CVE-2022-45186
%Y-%m-%dT%H:%M:%S.%f 2024-11-01T12:34:56.789012 908 0.52% CVE-2022-35604

dateReserved (51,771 instances)

Format Example Count Percentage Example CVEs
%Y-%m-%dT%H:%M:%S.%fZ 2024-11-01T12:34:56.789012Z 49,222 95.08% CVE-2013-3307, CVE-2013-10054
%Y-%m-%dT%H:%M:%S 2024-11-01T12:34:56 2,549 4.92% CVE-2013-6488, CVE-2013-6406

datePublished (47,436 instances)

Format Example Count Percentage Example CVEs
%Y-%m-%dT%H:%M:%S.%fZ 2024-11-01T12:34:56.789012Z 46,676 98.40% CVE-2013-3307, CVE-2013-10054
%Y-%m-%dT%H:%M:%S 2024-11-01T12:34:56 760 1.60% CVE-2022-45186, CVE-2022-45185

dateRejected (4,589 instances)

Format Example Count Percentage Example CVEs
%Y-%m-%dT%H:%M:%S.%fZ 2024-11-01T12:34:56.789012Z 2,788 60.76% CVE-2013-0972, CVE-2013-0965
%Y-%m-%dT%H:%M:%S 2024-11-01T12:34:56 1,801 39.24% CVE-2013-6488, CVE-2013-6406

dateAssigned (753 instances)

Format Example Count Percentage Example CVEs
%Y-%m-%dT%H:%M:%S.%fZ 2025-03-24T19:42:35.555Z 413 54.85% CVE-2025-2762, CVE-2025-2021
%Y-%m-%dT%H:%M:%S.%f%z 2025-01-12T21:19:37.367-06:00 340 45.15% CVE-2025-0412, CVE-2024-6818

datePublic (6,816 instances)

Format Example Count Percentage Example CVEs
%Y-%m-%dT%H:%M:%S.%fZ 2025-02-25T11:44:00.000Z 6,452 94.66% CVE-2022-25773, CVE-2022-31668
%Y-%m-%dT%H:%M:%S.%f%z 2023-11-16T14:20:02.768-06:00 340 4.99% CVE-2025-0412, CVE-2024-6818
%Y-%m-%dT%H:%M:%S%z 2023-11-29T00:00:00+00:00 22 0.32% CVE-2022-2232, CVE-2024-7730
%Y-%m-%dT%H:%M:%SZ 2024-11-12T12:00:00Z 2 0.03% CVE-2024-45819, CVE-2024-45818

Comparison with Historical Data

When comparing recent CVEs (last 365 days) with the full historical dataset (315,569 CVEs total):

Improvement in Standardization:

  • The RFC 3339/ISO 8601 format with microseconds and Z timezone (%Y-%m-%dT%H:%M:%S.%fZ) is now more dominant in recent CVEs:
    • dateUpdated: 97.56% (recent) vs 79.1% (historical)
    • dateReserved: 95.08% (recent) vs 35.7% (historical)
    • datePublished: 98.40% (recent) vs 38.1% (historical)

Persisting Issues:

  • Even in recent CVEs, timestamps without timezone indicators still appear (1.92% in dateUpdated, 4.92% in dateReserved)
  • dateRejected still shows significant inconsistency with 39.24% missing timezone indicators
  • Multiple format variations persist in datePublic (4 different formats)

Key Observations

  1. Missing Timezone Indicators: While most recent timestamps include the Z timezone indicator, 1.60% to 4.92% still lack timezone information entirely. This makes UTC interpretation ambiguous for parsers. Examples include CVE-2013-6488 and CVE-2022-45186.

  2. Precision Variations: Some timestamps include microseconds while others don't (like CVE-2022-35604 vs CVE-2013-6488), which can cause issues when strict format validation is required.

  3. The dateRejected Field Stands Out: This field shows the highest inconsistency at 39.24% missing timezone indicators, suggesting different tooling or workflows may be used for CVE rejections.

  4. Multiple Formats in datePublic: This field has 4 different format variations, including some with +00:00 timezone notation (like CVE-2022-2232) instead of the standard Z.

Suggestions for Moving Forward

Here are some ideas that might help achieve 100% consistency:

  1. Standardize on one format: It would be great if all timestamp fields used YYYY-MM-DDTHH:MM:SS.ffffffZ format (RFC 3339 with microseconds and the Z timezone indicator). This is already what 95-98% of records use!

  2. Add validation to submission tools: Stricter timestamp validation could catch non-compliant formats before they're committed. The dateRejected field in particular might benefit from this.

  3. Review the rejection workflow: Since dateRejected has the highest inconsistency rate (39.24%), there might be a different tool or process being used for rejections that doesn't enforce the same format standards.

  4. Example CVEs to investigate: I've included example CVE IDs in the tables above for each format variation. These might be helpful for tracking down where the inconsistencies originate.

How This Analysis Was Done

I ran this analysis in response to the CVE AWG group's statement that CVEs published in the last year should use only one timestamp format. I wanted to verify this and see where we stand!

The analysis covered 51,773 CVE files published between November 12, 2024 and November 11, 2025, examining all six timestamp fields defined in the CVE 5.x schema (dateUpdated, dateReserved, datePublished, dateRejected, dateAssigned, and datePublic).

The progress is impressive! Compared to the full historical dataset (315,569 CVEs total, which showed only 35.7% to 79.1% standardization depending on the field), recent CVEs show 95-98% standardization. The improvements are clearly working – there's just a bit more to go to reach 100%.

I've included example CVE IDs in the tables above for each format variation to make it easier to investigate specific cases. Hope this data is helpful! Let me know if you need any additional analysis or have questions about the methodology. 😊

jgamblin avatar Nov 11 '25 22:11 jgamblin