IntelOwl icon indicating copy to clipboard operation
IntelOwl copied to clipboard

Refactor analyzers which download external dbs for local queries

Open mlodic opened this issue 1 year ago • 11 comments

There are several analyzers like Tor, Maxmind, JA4DB and so on, that download an external db and update it every once in a while. The problem is that they store it as a local file and the analyzers parse those files in search of a specific entry. It would make sense to store them as additional tables in the database and to query the data directly there.

mlodic avatar Jul 04 '24 14:07 mlodic

Hi @mlodic Can I take this up? It will be a nice challenge.

I plan to implement this for JA4DB first and upon successful implementation I'll subsequently implement this for other analyzers. This might take some time for me to implement as I need to understand Django models a bit more and how everything works in intelowl, like how everything is connected.

If there are any doubts, shall I ask it here or in Slack channel?

spoiicy avatar Dec 25 '24 07:12 spoiicy

sure, feel free to ask where you want

mlodic avatar Dec 25 '24 09:12 mlodic

Hi @mlodic While I was running the refactored JA4_DB to update the api data to DB, I was getting the error "NotImplementedError". Unable to find data model for generic.

I want to understand what is the purpose of data_models_manager and why is this error coming up because the json report returned from the analyzer is successfully populated in "analyzers_manager_analyzerreport" table.

Am i missing something? Do I need to create a generic data model and a corresponding serializer which will be used by the analyzers, we are going to refactor?

Would really appreciate your inputs on this since this would be a crucial information to understand which will help me in solving this issue.

spoiicy avatar Jan 06 '25 14:01 spoiicy

thanks for sharing, I'll ping you towards the guys who created the data models which are a fresh new feature added to the last release.

I know that data models were not added to the generic observables so I would like to ask you to share additional information about that error. Maybe full stack trace or screenshots of the analyzed observable and when you encounter the error. Ty

mlodic avatar Jan 07 '25 09:01 mlodic

@cristinaascari is reviewing the issue

mlodic avatar Jan 07 '25 10:01 mlodic

Thanks @mlodic & @cristinaascari. Here is the link to the error details.

Refactored JA4_DB.py analyzer

def update(cls):
    logger.info(f"Updating table from {cls.url}")
    response = requests.get(url=cls.url)
    response.raise_for_status()
    data = response.json()
    
    if JA4Fingerprint.objects.count() != 0:
        with connection.cursor() as cursor:
            cursor.execute("TRUNCATE TABLE analyzers_manager_ja4fingerprint RESTART IDENTITY CASCADE;")
    
    instances = [JA4Fingerprint(**item) for item in data]
    JA4Fingerprint.objects.bulk_create(instances)
    logger.info(f"Table updated")

def run(self):
    reason = self.check_ja4_fingerprint(self.observable_name)
    if reason:
        return {"not_supported": reason}
    if JA4Fingerprint.objects.count() == 0:
        logger.info(
            f"Table does not exist, initialising..."
        )
        self.update()
    
    application = JA4Fingerprint.objects.filter(ja4_fingerprint=self.observable_name).values().first()
    if application:
        return dict(application)
    return {'found': False}

Corresponding JA4Fingerprint Model:

class JA4Fingerprint(models.Model):
  application = models.CharField(max_length=255, null=True, blank=True)
  library = models.CharField(max_length=255, null=True, blank=True)
  device = models.CharField(max_length=255, null=True, blank=True)
  os = models.CharField(max_length=255, null=True, blank=True)
  user_agent_string = models.TextField(null=True, blank=True)
  certificate_authority = models.CharField(max_length=255, null=True, blank=True)
  observation_count = models.PositiveIntegerField(default=1)
  verified = models.BooleanField(default=False)
  notes = models.TextField(null=True, blank=True)
  ja4_fingerprint = models.CharField(max_length=255, null=True, blank=True)
  ja4_fingerprint_string = models.TextField(null=True, blank=True)
  ja4s_fingerprint = models.CharField(max_length=255, null=True, blank=True)
  ja4h_fingerprint = models.CharField(max_length=255, null=True, blank=True)
  ja4x_fingerprint = models.CharField(max_length=255, null=True, blank=True)
  ja4t_fingerprint = models.CharField(max_length=255, null=True, blank=True)
  ja4ts_fingerprint = models.CharField(max_length=255, null=True, blank=True)
  ja4tscan_fingerprint = models.CharField(max_length=255, null=True, blank=True)

spoiicy avatar Jan 07 '25 10:01 spoiicy

https://github.com/intelowlproject/IntelOwl/pull/2662 this fix has just been merged. Can you please update your fork and try again? thanks

mlodic avatar Jan 07 '25 11:01 mlodic

#2662 this fix has just been merged. Can you please update your fork and try again? thanks

Just now I was seeing the PR merged by @cristinaascari. Will pull the latest code and try again. Asking just out of curiousity, why are we just returning an empty list for GENERIC type rather than implementing a proper data model. :)

spoiicy avatar Jan 07 '25 11:01 spoiicy

@cristinaascari has been testing an organized solution for a while, we'll wait for her PR to fix this.

mlodic avatar Feb 20 '25 09:02 mlodic

This issue has been marked as stale because it has had no activity for 10 days. If you are still working on this, please provide some updates.

github-actions[bot] avatar Mar 03 '25 09:03 github-actions[bot]