ThreatExchange icon indicating copy to clipboard operation
ThreatExchange copied to clipboard

[py-tx] Investigate dbm as a replacement for the default store

Open Dcallies opened this issue 1 year ago • 0 comments

We hand-rolled a file storage for python-threatexchange even though the data is extremely simple key-value storage:

  • Key: the int or string returned by fetch()
  • Value: the dataclass in the value returned by fetch(). All the core ones are compatible with dacite, and so json-serializable

The current implementation stores this in the json serialization of a massive dict, which requires a full in-memory merge. At larger dataset sizes, this becomes untenable.

Because the data partitions so easily, any string: string key value store should work. We've discussed sqllite in the past, but it has the downside of requiring additional libraries.

dbm seems like it might be just a straight up upgrade over the current dumb file. It's similarly flexible, but has the additional benefit that it may get an optimized implementation (on unix), and even the dumb implementation doesn't load the data, only the keynames.

https://docs.python.org/3/library/dbm.html

Dcallies avatar Mar 14 '23 13:03 Dcallies