rita icon indicating copy to clipboard operation
rita copied to clipboard

Improve connection count beaconing subscore

Open ethack opened this issue 4 years ago • 3 comments

The count score portion of beacon gets skewed by long connections. The divsor is tsMax - tsMin but tsMin can be days prior to the majority of the dataset due to long connections.

Instead of using the absolute minimum of the dataset we should use use the short connection minimum. I can see this being defined in two ways, depending on which implementation is more advantageous in the code. Option 1 (pseudocode):

// find the minimum timestamp for any connection less than 1 second (or similar small value as a threshold)
shortMinTs = infinity
for conn in connections:
  if conn.duration < 1 and conn.ts < shortMinTs:
    shortMinTs = conn.ts

(Note: this assumes there will be connections shorter than the threshold. Very likely in real-world datasets, but may cause issues in lab or artificial datasets.)

Option 2 (pseudocode):

// find the min timestamp for the connection with the shortest duration
minDuration = infinity
shortMinTs = infinity
for conn in connections:
  if conn.duration < minDuration and conn.ts < shortMinTs:
    shortMinTs = conn.ts
    minDuration = conn.duration

(Note: This breaks in certain cases. Matters on the order of connections coming in. Doesn't work if day starts out with a long connection ending.)

The second option doesn't have any defined threshold so should be more robust to any dataset. Either one could be replaced with a database query if we stored durations (which we don't currently).

In practice the tsMin should almost always be close to tsMax - 86400 when dealing with 24 hour datasets. I did some testing with that value hardcoded in and saw very favorable improvements in beacon scores. Actual malware beacon scores shot up by quite a bit while beacons from "normal" software (ones with very frequent beacons) increased by only a small amount.

Note that the min values will likely need to be stored / read from the database between runs.

ethack avatar Jan 19 '21 18:01 ethack

Has any progress been made on this? I can work on it if it's still relevant.

Hippi3Hack3r avatar Apr 08 '21 17:04 Hippi3Hack3r

Just wanted to add a screenshot. This is about as perfect a beacon as I've seen in the wild. But the connections score is bringing it down.

image

ethack avatar Jul 14 '21 05:07 ethack

We should consider storing an observation period for each unique connection. Then, we could filter over those to find the total observation period for the dataset.

The beginning of the observation period would be the earliest (start ts + duration) seen for the unique connection and the end would be the latest (start ts + duration). Using this period while scoring beacons would let us accurately measure the ratio between how often a beacon connected and how much time passed while we observed it.

Zalgo2462 avatar Aug 05 '21 20:08 Zalgo2462