GreedyBear icon indicating copy to clipboard operation
GreedyBear copied to clipboard

deduplicate command sequences

Open mlodic opened this issue 6 months ago • 9 comments

from the internal logs, it seems that same command sequences are stored multiple times. to investigate

mlodic avatar Jun 24 '25 16:06 mlodic

Would you attach or send me the logs?

regulartim avatar Jun 24 '25 19:06 regulartim

sorry for the late answer, I am AFK these days, I'll get back as soon as I can on this

mlodic avatar Jun 27 '25 11:06 mlodic

Tim, when I checked that I noticed that there were some identical commands here:

Image

They are dated March so I am not sure whether this was fixed. In any case, I did a little change here fb9aeeadba9987dcec9b3839fa79ea7f23d7743c and I'll test out whether this does not happen anymore.

I also added some logs because I noticed that there were no any recent commands sequences collected in the last months so it has been weird to me.

mlodic avatar Jul 13 '25 14:07 mlodic

They are dated March so I am not sure whether this was fixed.

No, I do not think it is fixed.

The model definition of the command sequences does not allow duplicates, because the commands_hash has a unique constraint:

class CommandSequence(models.Model):
    first_seen = models.DateTimeField(blank=False, default=datetime.now)
    last_seen = models.DateTimeField(blank=False, default=datetime.now)
    commands = pg_fields.ArrayField(models.CharField(max_length=1024, blank=True), blank=False, null=False, default=list)
    commands_hash = models.CharField(max_length=64, unique=True, blank=True, null=True)
    cluster = models.IntegerField(blank=True, null=True)

It would be interesting to see the content of the commands_hash field for two of the duplicate sequences. Would you provide this information?

regulartim avatar Jul 13 '25 18:07 regulartim

it is empty. My speculation is that the deduplication does not happen sometimes because we dont always have the closing event from cowrie and the code did the deduplication there. I have just deployed a little change about that and I'll monitor it in the next days and update here

mlodic avatar Jul 13 '25 18:07 mlodic

My speculation is that the deduplication does not happen sometimes because we dont always have the closing event from cowrie and the code did the deduplication there.

Yes, that sounds plausible. Then the question is why do we miss the closing event? That shouldn't happen. You can check if a closing event was recorded for a particular session, if the session has a duration property. On my private instance I do not see any session without a closing event.

regulartim avatar Jul 13 '25 18:07 regulartim

I guess you are right, I wanted to experiment with the honeynet servers which seem to be sometimes faulty. I still can't have access to Kibana but this problem needs to be definitevely solved otherwise I can't debug further

mlodic avatar Jul 17 '25 17:07 mlodic

Maybe we could add a cleanup routine that deletes command sequences that are older than 1 day but do not have a commands_hash value (which means that they do not have the closing event from cowrie).

regulartim avatar Oct 03 '25 08:10 regulartim

theoretically with the change implemented here, that should not happen.

mlodic avatar Oct 03 '25 09:10 mlodic