ThreatExchange icon indicating copy to clipboard operation
ThreatExchange copied to clipboard

[py-tx] New extension interface for storage

Open Dcallies opened this issue 1 year ago • 0 comments

We want to the ability to add new storage mechanisms as an alternative to the one that comes installed by default in py-tx. We think that dbm might be a much better default storage

Pre-read material:

  1. Readme: https://github.com/facebook/ThreatExchange/tree/main/python-threatexchange
  2. SignalExchange interface (especially storage): https://github.com/facebook/ThreatExchange/blob/main/python-threatexchange/threatexchange/exchanges/signal_exchange_api.py#L20
  3. Backwards compatibility guarantee: https://github.com/facebook/ThreatExchange/tree/main/python-threatexchange#general-expectation-for-compatibility-and-versioning
  4. dbm module: https://docs.python.org/3/library/dbm.html

There will be a series of milestones:

  1. We'll be defining a new python interface for what methods need to be implemented for storage, likely patterned on https://github.com/facebook/ThreatExchange/blob/main/python-threatexchange/threatexchange/exchanges/helpers.py#L69
  2. Apply the interface to the existing storage at https://github.com/facebook/ThreatExchange/blob/main/python-threatexchange/threatexchange/cli/cli_state.py
  3. Create a dbm implementation of the interface
  4. Swap out the dbm version of the interface and show that it still produces the full dataset with the dataset command
  5. Add storage to the extensions interface at https://github.com/facebook/ThreatExchange/blob/main/python-threatexchange/threatexchange/extensions/manifest.py
  6. Add in a configuration field to https://github.com/facebook/ThreatExchange/blob/main/python-threatexchange/threatexchange/cli/cli_config.py#L47 which is the selected storage mechanism. Unset should default to the old in-memory merge file storage
  7. Add the ability to select the storage backend with a cli command - think about edge case behavior here
  8. End-to-end test swapping storages / large download
  9. [Stretch] work with Scott at the hackathon to spec out an AWS-based storage extension. It can live in https://github.com/facebook/ThreatExchange/tree/main/python-threatexchange/threatexchange/extensions as an "official" extension

Dcallies avatar Mar 27 '23 13:03 Dcallies