sof-elk icon indicating copy to clipboard operation
sof-elk copied to clipboard

Proper field name hygeine

Open SMAPPER opened this issue 8 years ago • 3 comments

How do you feel about this proposal for field names?

Field name standards (always follow):

  1. Only use lower case characters (“first_name” instead of “FirstName”)
  2. Avoid special characters except underscores (“first_name” instead of “first name”)
  3. Use underscores to separate words in a field name (“destination_port” instead of “destinationport”)
  4. Whenever possible, rename field names with the same purpose to one field name (IPAddress, IP, ip, should be consolidated to ip)
  5. Due to individuals abbreviating differently, do not use appreviations (“source_port” instead of “src_port”)
  6. Always use singular forms not plural (“message” instead of “messages”)
  7. Use proper spelling of words

Field name guidelines (try to follow):

  1. Use present tense unless field describes historical information (Example: end of connection recording “bytes_received”)
  2. Always use singular forms not plural (“message” instead of “messages”) a. Exception: When describing something that is past tense and the expectation is for multiple values (“bytes_received” instead of “byte_received”)
  3. Where possible rename fields to match consistent names so long as renaming the field does not cause the event to lose context (Example: “unauthorized_user” may be able to be renamed to “user” if the only event that contains the field “unauthorized_user” has another field that provides the context of a failed login)

SMAPPER avatar Oct 04 '16 17:10 SMAPPER

Good timing - just got our last stuff turned in so I've (finally) got cycles between now and editorial review.

Only a few questions - overall I think all is solid. Field name standards No 4 and Field name guidelines No 3: I think there will be some (many?) cases where some form of important meaning is retained in the field name. For example, an SSH login entry has a "source_ip" field, the "source" part is important and IMO should be retained even if there is no other IP address present. Another (less clear) example would be an HTTP log's vhost value - do we call it something like "http_hostname" or just "hostname"? I feel the latter is too generic and loses important context. Field name standards No 4 uses "Whenever possible" terminology, but it's in an "always follow" section - should this be combined with Field name guidelines No 3 and clarified per above, since I believe they are related?

I'd also like to add the following:

  • IP address fields will end with "_ip" (this is for the dynamic mapping)
  • All IP addresses will receive GeoIP lookups for geo and ASN, which will be added to a corresponding "*_geo" field (i.e. "source_ip" will derive "source_geo") (While you still can't dynamically map a geopoint, it keeps things consistent.)
  • All IP addresses will be added to the "ips" array upon their identification in the parser

philhagen avatar Oct 04 '16 17:10 philhagen

I agree with source_ip should definitely not become ip. Should the rule for IP address fields be they must specify context such as source_ip or destination_ip but never just ip?

As far as the less clear fields, unfortunately there are going to be so many of them I'm not sure we can create rules around them. For those I think either best judgement or a community forum should be used. I'm thinking over time we may want to create a csv file with field rename mappings to try and standardize these such as your vhost example. So long as everyone agrees to follow pre-defined standards or field names in this csv file the community should prosper.

I moved the whenever possible terminology to the guideline section as that makes sense and build off of guideline No 3.

Here is a revised proposal:

Field name standards (always follow):

  1. Only use lower case characters (“first_name” instead of “FirstName”)
  2. Avoid special characters except underscores (“first_name” instead of “first name”)
  3. Use underscores to separate words in a field name (“destination_port” instead of “destinationport”)
  4. Due to individuals abbreviating differently, do not use appreviations (“source_port” instead of “src_port”)
  5. Always use singular forms not plural (“message” instead of “messages”)
  6. Use proper spelling of words
  7. IP address fields must end with “_ip” (this is for dynamic mapping)
  8. All IP addresses will receive GeoIP lookups for geo and ASN, which will be added to a corresponding “*_geo” field (i.e. “source_ip” will derive “source_geo”)
  9. All IP addresses must be added to the ips array
  10. All user fields must be added to the users array (field data from fields such as “user”, “source_user”, “destination_user” should be added to users array)

Field name guidelines (try to follow):

  1. Use present tense unless field describes historical information (Example: end of connection recording “bytes_received”)
  2. Always use singular forms not plural (“message” instead of “messages”) a. Exception: When describing something that is past tense and the expectation is for multiple values (“bytes_received” instead of “byte_received”)
  3. Whenever possible rename fields to match consistent names so long as renaming the field does not cause the event to lose context (Example: “unauthorized_user” may be able to be renamed to “user” if the only event that contains the field “unauthorized_user” has another field that provides the context of a failed login)
  4. Whenever possible, rename field names with the same purpose to one field name (“SrcIP”, “SourceIP”, “src_ip”, should be consolidated to “source_ip”)

Common field name replacement standards Previous Field Name New Name IPAddress, IP, ip_addr ip SourceIP, src_ip source_ip DestinationIP,dst_ip destination_ip Username, User user SourcePort, src_port source_port DestinationPort, dst_port destination_port

SMAPPER avatar Oct 04 '16 18:10 SMAPPER

Just wondering if anything in the Elastic Common Schema might be of use here?

https://github.com/elastic/ecs

dsplice avatar Dec 04 '18 20:12 dsplice