ocsf-schema What is the value of Account Type?

In order to promote interoperability, OCSF must define a "schema", not just a "schema framework". The data that goes into logging information must be defined across vendors, not just "captioned".

Consider dictionary.json:

    "account_type": {
      "caption": "Account Type ID",
      "description": "The user account type (e.g. AWS, LDAP, Windows account, etc.).",
      "type": "string_t"
    },
    "account_type_id": {
      "caption": "Account Type ID",
      "description": "The user account type identifier (e.g. AWS, LDAP, Windows account, etc.).",
      "enum": {
        "-1": {
          "caption": "Other",
          "description": "The user account type is not mapped."
        },
        "0": {
          "caption": "Unknown",
          "description": "The user account type is unknown."
        },
        "1": {
          "caption": "LDAP Account"
        },
        "2": {
          "caption": "Windows Account"
        },
        "3": {
          "caption": "AWS IAM Account"
        },
        "4": {
          "caption": "GCP Account"
        },
        "5": {
          "caption": "Azure AD Account"
        }
      },
      "type": "integer_t"
    },

This is a framework for an enumeration, but OCSF defines no value for the "account_type" "string_t". An information model (abstract schema) does define enumerations:

ID	Name	Description
-1	?	Other: The user account type is not mapped.
0	?	Unknown: The user account type is unknown.
1	?	LDAP Account:
2	?	Windows Account:
3	?	AWS IAM Account:
4	?	GCP Account:
5	?	Azure AD Account:

The name column (the string_t account_type) is undefined. Which means that when looking at, for example, Splunk logs, OCSF provides no guidance:

<TS> phonenumber=333-444-4444, app=angrybirds, installdate=xx/xx/xx, acct=Windows Account
<TS> phonenumber=333-444-4444, app=facebook, installdate=yy/yy/yy, acct=Azure AD Account

Using captions might work for comma-separated data fields (assuming captions prohibit commas), but it definitely will not work for space-separated data:

<TS>
USER ACCT PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
Root Windows Account 41 21.9 1.7 3233968 143624 ?? Rs 7Jul11 48:09.67 /System/Library/foo
Rdas Azure AD Account 790 4.5 0.4 4924432 32324 ?? S 8Jul11 9:00.57 /System/Library/baz

Enumeration names enable interchangeable logging data:

ID	Name	Description
-1	other	Other: The user account type is not mapped.
0	unknown	Unknown: The user account type is unknown.
1	ldap	LDAP Account:
2	windows	Windows Account:
3	aws_iam	AWS IAM Account:
4	gcp	GCP Account:
5	azure_ad	Azure AD Account:

enables

<TS>
USER ACCT PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
Root windows 41 21.9 1.7 3233968 143624 ?? Rs 7Jul11 48:09.67 /System/Library/foo
Rdas azure_ad 790 4.5 0.4 4924432 32324 ?? S 8Jul11 9:00.57 /System/Library/baz

Defining enumerated strings is the rationale for formatting "enum" entries with both a property name and an integer id, as proposed in Issue #214:

    "account_type_id": {
      "caption": "Account Type ID",
      "name": "AccountType",
      "description": "The user account type identifier (e.g. AWS, LDAP, Windows account, etc.).",
      "enum": {
        "other": {
          "caption": "Other",
          "description": "The user account type is not mapped."
          "id": -1
        },
      ...

An example schema containing just Enumerated data types defined in the OCSF enums folder is available here. The OCSF files could easily be updated to define both datatype names and property names.

Sep 15 '22 15:09 davaya

The text-based enum values must be translated to the integer values, otherwise it will be very confusing to have 2 sets of values that represent the same thing. The caption is just the user-friend name of the integer value.

Sep 16 '22 17:09 rroupski

An enumeration is a 1:1 equivalence between a text string and an integer - they go both ways, just like C language defines:

#define  O_RDONLY    00000000    /* Read Only */
#define  O_WRONLY    00000001    /* Write Only */
#define  O_RDWR      00000002    /* Read and Write */

The problem with caption is that it is not an identifier, which is why it doesn't work in the Splunk log example shown above, or as the identifier in a #define. Captions exist in the natural language space, text identifiers are in the human-readable computer language space.

"Read and Write" and "Azure AD Account" are natural-language captions in unrestricted strings, O_RDWR and azure_ad are text identifiers with a defined lexical form.

Sep 17 '22 03:09 davaya

Correct, the enum value is the identifier, the caption is a user friend name of the integer value.

Regarding the example above, the raw values found in the logs must be translated to the OCSF enum values. Otherwise, depending on who logged the data different values may represent the save data.

Sep 19 '22 17:09 rroupski

Caption represents enum string value, I don't see an issue in current enum definition.

Sep 19 '22 22:09 irakledibm

I believe the issue is that Caption is not considered a discrete value, or token, for example in a switch statement. If we want to have dual mode enums (integers <-> string token) we would need to add the token to the enum definitions and the caption would never be used to populate an event, it would only be for documentation. If there is a desire for the dual mode enum, e.g. because a token might be easier to remember for a consumer doing an ad hoc query, we would need to go through every enum and assign a (memorable, consistent) token.

Oct 15 '22 21:10 pagbabian-splunk

String enums siblings have been addressed with #450

Jan 31 '23 21:01 paveljos