cartography icon indicating copy to clipboard operation
cartography copied to clipboard

Standardize and make ID fields consistent

Open achantavy opened this issue 2 years ago • 5 comments

Description:

What issue is being seen? Describe what should be happening instead of the bug, for example: Cartography should not crash, the expected value isn't returned, the data schema is wrong, etc.

As mentioned in #910, cartography node fields are inconsistent. For example,

in an AWSAccount you'd need to use id, whereas an AWSUser has a userid and arn but not an id.

Please complete the following information::

  • Cartography release version or commit hash [e.g. 0.12.0 or 95e8e11913e2a44a4d4682506d8364a638ceac69] 0.65.0

Add any other context about the problem here

This blocks #910.

achantavy avatar Nov 08 '22 23:11 achantavy

I think the first step should be to list all the different node types and identify which property to use as a unique ID for each.

danielsaporo avatar Nov 16 '22 09:11 danielsaporo

consistency also applies to other field/tools. hostname/short_hostname can have different case. hostname can be short or long depending on tool, or truncated to 15 characters for windows per netbios limitations) mac address can be written in a variety of way too. tags can be values or key-values

juju4 avatar Feb 04 '23 22:02 juju4

Is it possible give unique id for all nodes & add separate field which describes that node(label/ description).

Also where can i find all kinds of ids?

gokulyc avatar Feb 16 '23 11:02 gokulyc

@achantavy one solution for this problem (and especially #910) is to use computed community ID.

With new datamodel it will be easy to implement:

  • add a flag for Field to indicate a field is part of community ID OR add a property to NodeSchema that list all property fields that are part of community ID
  • when generating insert query compute this field value and hash the result (ex : concat field value and get the md5 hash of this concatenated value) and insert it under "community_id" field

I used this method in my previous job to handle partial node from network logs.

Today I use this to extract inventories from Cartography

jychp avatar Mar 21 '23 07:03 jychp

As a brief update, I'd like to refactor more AWS assets to the new data model before working on ID standardization. Once things are in the new model, the standardization task will be a lot more straightforward and less error prone.

achantavy avatar Jul 15 '23 16:07 achantavy