cartography
cartography copied to clipboard
Standardize and make ID fields consistent
Description:
What issue is being seen? Describe what should be happening instead of the bug, for example: Cartography should not crash, the expected value isn't returned, the data schema is wrong, etc.
As mentioned in #910, cartography node fields are inconsistent. For example,
in an AWSAccount you'd need to use id, whereas an AWSUser has a userid and arn but not an id.
Please complete the following information::
- Cartography release version or commit hash [e.g. 0.12.0 or 95e8e11913e2a44a4d4682506d8364a638ceac69] 0.65.0
Add any other context about the problem here
This blocks #910.
I think the first step should be to list all the different node types and identify which property to use as a unique ID for each.
consistency also applies to other field/tools. hostname/short_hostname can have different case. hostname can be short or long depending on tool, or truncated to 15 characters for windows per netbios limitations) mac address can be written in a variety of way too. tags can be values or key-values
Is it possible give unique id for all nodes & add separate field which describes that node(label/ description).
Also where can i find all kinds of ids?
@achantavy one solution for this problem (and especially #910) is to use computed community ID.
With new datamodel it will be easy to implement:
- add a flag for Field to indicate a field is part of community ID OR add a property to NodeSchema that list all property fields that are part of community ID
- when generating insert query compute this field value and hash the result (ex : concat field value and get the md5 hash of this concatenated value) and insert it under "community_id" field
I used this method in my previous job to handle partial node from network logs.
Today I use this to extract inventories from Cartography
As a brief update, I'd like to refactor more AWS assets to the new data model before working on ID standardization. Once things are in the new model, the standardization task will be a lot more straightforward and less error prone.