ocsf-schema icon indicating copy to clipboard operation
ocsf-schema copied to clipboard

[ Schema Extension ] FileObject: Extend File Type with “Executable”

Open dfirence opened this issue 6 months ago • 15 comments

Needs

The ability to leverage data stores encoded with OCSF schema, and where a query for file types that are executable - specifically executable images on Windows, Linux, MacOs - i.e., PE32, PE32++, ELF, MACHO

The new enum type will allow workflows for searching by both machines and humans to use a single predicate against some random column as:

— Example query with desired extension of FileType
select file_name, file_sha2
from some_table
where ocsf_file_type_id = 8 or ocsf_file_type = ‘Executable’

Current FileObject type_id enum values

 "type_id": {
      "description": "The file type ID.",
      "enum": {
        "0": {
          "caption": "Unknown"
        },
        "1": {
          "caption": "Regular File"
        },
        "2": {
          "caption": "Folder"
        },
        "3": {
          "caption": "Character Device"
        },
        "4": {
          "caption": "Block Device"
        },
        "5": {
          "caption": "Local Socket"
        },
        "6": {
          "caption": "Named Pipe"
        },
        "7": {
          "caption": "Symbolic Link"
        },
        "99": {
          "caption": "Other"
        }
      },
      "requirement": "required"
    }

dfirence avatar May 12 '25 00:05 dfirence

Good suggestion.

A thorn with it is that file.type_id seems to have been used to indicate the "filesystem file type" where executable would fall into "Regular File".

mime_type also exists, but that doesn't meet the need where you want to ask about all executable files.

How do you feel about a new field like regular_file_type_id that is set if file.type_id == 1? Another option could be a new is_executable field.

How do you think library files (static or dynamic linking) should be handled?

mlmitch avatar May 12 '25 21:05 mlmitch

Would checking mime_type against common executable types (e.g., based on file/magic) be satisfactory?

E.g.,

  • application/x-mach-binary (Apple Mach-O)
  • application/x-executable, application/x-pie-executable, application/x-sharedlib (ELF)
  • application/x-dosexec (DOS PE)
  • application/vnd.microsoft.portable-executable (Windows PE)

I suggest mime_type because (1) it is already there, and (2) whether a file is executable is sometimes ambiguous. E.g., The DMG format mentioned is not actually an executable format. It's a disk image container file that the macOS Disk Image Mounter opens. However, it's an installer format and often have security interest like executable files. It may be better for the query to specify exactly what file types it is interested in.

antchan2 avatar May 14 '25 22:05 antchan2

@antchan2 - Just changed to MACHO, thanks for catching that. @mlmitch

I want to focus the ask on extending the Enum to type_id = 8 and corresponding type_name = Executable.

Better for existing DX, the enum can be passed around and let my local OCSF clients have minimal code changes and the type system enforce the constraint from the type_id.

Additionally, I don’t want to extend existing parquet files or columns to accommodate a new single field when we have an enum that is serving efficiently. For example, If I cap my parquet file at 300MBs, adding a new column for that new mime_type can consume it unnecessarily.

@mlmitch - regarding your good question on linking (static/dynamic), I think this ask is more about what it is, and linking to me feels like how it can be loaded to later execute. In that thought, then the activity types like memory activity feels closer for that discussion, happy to chime in on a separate GH discussion and maybe that could lead to a new ask via issues here - wdyt?

dfirence avatar May 14 '25 23:05 dfirence

On the linking stuff, I mean how do we treat library files that aren't directly executable but still contain executable code?

mlmitch avatar May 15 '25 14:05 mlmitch

@mlmitch

It is an executable, whether it be an .so, dll, rlib, or object file, etc.

dfirence avatar May 16 '25 23:05 dfirence

@antchan2 on the use of mime_type, I agree that it is a feasible solution.

I have two high-level thoughts on the matter.

First, "Regular File" is a ridiculous file.type_id value (no offence to whoever made it - I get what you're going for) and I would like to fix it.

Second, I think mime_type will end up with a poor experience when it comes to dataset querying.

Considering the ask in this issue, a dataset user wants to ask for executable files. Doing so with mime_type requires querying with several string equality conditions and it relies on the datasource correctly setting / providing the type information. When we consider mapping existing endpoint data, that information isn't often reported, but certain events will imply that the file is executable (e.g. library load, file execution). So I think it will be easier for the community at large to set file.type_id.

We don't need to boil the ocean on this either. We can leave "Regular File" as a sort of fallback/default and add in additional file types as the need arises, starting with "Executable".

mlmitch avatar May 20 '25 14:05 mlmitch

@mlmitch:

"Regular File" is a ridiculous file.type_id value

OCSF's current file.type_id definition appears to be based on the Unix definition of a file. Through that lens, "regular file" and the other values may make more sense as they map directly to the File Type (S_IFMT) bit definitions in POSIX file stat:

S_IFMT
  Type of file.
  S_IFBLK
    Block special.
  S_IFCHR
    Character special.
  S_IFIFO
    FIFO special.
  S_IFREG
    Regular.
  S_IFDIR
    Directory.
  S_IFLNK
    Symbolic link.
  S_IFSOCK
    Socket.

That said, I completely understand when we talk about file type in a cybersecurity context we are typically referring to the file content type, so I can relate to the surprise.

mime_type will end up with a poor experience when it comes to dataset querying... add in additional file types as the need arises, starting with "Executable".

If mime_type is too heavy for this case, does adding an optional boolean is_executable provide an equally good dataset querying experience as a new type_id value?

If so, that may be more attractive as there is precedent (e.g., is_encrypted, is_system) with this approach and would not disrupt an existing required field where all defined values are mutually exclusive and do not require judgement.

antchan2 avatar May 20 '25 15:05 antchan2

Well I've certainly got some egg on my face. TIL (or today I relearned) regarding the POSIX file stat.

mlmitch avatar May 20 '25 16:05 mlmitch

POSIX file stat and Unix' interest in regular files is definitely under-the-hood. Glad pointing out the possible origin was helpful.

antchan2 avatar May 20 '25 17:05 antchan2

@antchan2 @mlmitch

POSIX or not, I don’t believe OCSF aims at describing OS internal structures (😀), I thought OCSF is a schema to express the cyber vocabulary. There’s no such thing as a Regular File.

Do we see this ask as possible to be applied in OCSF?

To be crystal, my ask is extend the existing enum to TypeId == 8 and corresponding TypeName == “Executable”

dfirence avatar May 21 '25 00:05 dfirence

Regular file aside (we can't change the label without making a breaking change, due to the convention of populating the sibling with the label) - we do not want to add an executable file type (it's not strictly a file type I would suggest but an attribute of the file) to the enum list since a file can have a combination of attributes as evidenced by its bitmask. For example, a file can also be read-only, and we don't want to have a file type for each combination of attributes of this sort.

I suggest we add an is_executable boolean field that can be applied along with the file type, and likely we should also add an is_readonly boolean as well. Today we have is_encrypted and is_system as well as is_deleted and so it will be consistent with that approach. Note there is also an attributes integer attribute for flags but admittedly that is too cumbersome for normal query predicates and it isn't standardized (by us anyway).

pagbabian-splunk avatar May 22 '25 01:05 pagbabian-splunk

@pagbabian-splunk - I am not asking for executables based on bitmask values like chmod +x foo having that attribute on.

I also don’t want the is_executable since that forces me to have a new column with bloated truthy/falsy in my parquet tables.

I was hoping given today I am using type_id, to have a new number and type_name. My logic stays the same , my parquet tables don’t expand with new columns, and my data retrieval logic stays the same.

I am already using this locally with my own extension, I thought this ask could result in an upstream change so I won’t deviate from schema too much. Worst case, I can continue to use my local implementation of the enum with type_id == 8.

All the suggestions and thoughts are noble, they can serve other use cases, for me the efficiencies desired are in that enum value extension.

I don’t want to complicate this or ask for too much effort, we can close this issue as wont fix.

dfirence avatar May 22 '25 02:05 dfirence

Thank you @dfirence - but you have brought up an important gap in this object, we will still need to address it. I now understand your desire, due to your existing schema.

pagbabian-splunk avatar May 22 '25 15:05 pagbabian-splunk

@dfirence as I have been thinking about this some more, and your desire to keep this to a single field, column, I think there is a good rationale to distinguish a Regular File from an Executable File since the construction of those files are different (taking advantage of the maybe odd sounding "Regular" adjective as mentioned).

I'm going to re-open your issue for now, and create the PR and add you as a reviewer for a PR with an additional type_id.

pagbabian-splunk avatar May 23 '25 15:05 pagbabian-splunk

@pagbabian-splunk , thank you - will look for the PR

Ref: PR 1438

dfirence avatar May 23 '25 23:05 dfirence

@dfirence Could this be closed out since the PR #1438 was merged?

mikeradka avatar Aug 12 '25 16:08 mikeradka