PatientLevelPrediction icon indicating copy to clipboard operation
PatientLevelPrediction copied to clipboard

Code base refactor - Data class

Open lhjohn opened this issue 8 months ago • 0 comments

This issue should be for discussing the requirements of a data class. Here is a preliminary list of requirements:

Functional Requirements:

  • Data Backend Support: Should support multiple data backends for holding and providing data.
  • Interprocess Communication (IPC) Support: Should be able to communicate with external processes (e.g. other programming languages) using different IPC mechanisms.
  • Data Persistence: Should provide functionality to load and save data using various methods.
  • Synthetic Data Handling: Should support the generation of synthetic data, currently used primarily for unit tests and prototyping.

Technical Requirements

  • Should support Arrow, Andromeda, DataTable as data backends.
  • Should support Andromeda, Arrow Flight, Arrow Feather for IPC.

Possible architecture:

classDiagram
    class Data {
        -.data
        -.meta_data
        -.backend: DataBackend
        +data()
        +metadata()
        +setBackend(backend: DataBackend)
        +load(file_path) : uses
        +save(file_path) : uses
        +to_ipc(ipc: IPC) : uses
        +from_ipc(ipc: IPC) : uses
    }

    class DataBackend {
        <<abstract>>
    }

    class DataBackend-Arrow
    class DataBackend-Andromeda
    class DataBackend-DataTable

    Data --> DataBackend
    DataBackend <|-- DataBackend-Arrow
    DataBackend <|-- DataBackend-Andromeda
    DataBackend <|-- DataBackend-DataTable

    class DataPersistence {
      <<abstract>>
      +load()
      +save()
    }

    class DataPersistence-LocalFile {
      +load(file_path)
      +save(.data, file_path)
    }

    class DataPersistence-Database {
      +load(connection_details)
    }

    class DataPersistence-Synthetic {
      +load()
    }

    Data --> DataPersistence
    DataPersistence <|-- DataPersistence-Database
    DataPersistence <|-- DataPersistence-LocalFile
    DataPersistence <|-- DataPersistence-Synthetic

    class IPC {
        <<abstract>>
        +doPut()
        +doGet()
        +doExchange()
    }

    class IPC-ArrowFlight
    class IPC-ArrowFeather
    class IPC-Andromeda

    IPC <|-- IPC-ArrowFlight
    IPC <|-- IPC-ArrowFeather
    IPC <|-- IPC-Andromeda

    Data --> IPC
    class PyData {
        +to_ipc(ipc: IPC)
        +from_ipc(ipc: IPC)
    }

    class RustData {
        +to_ipc(ipc: IPC)
        +from_ipc(ipc: IPC)
    }

    PyData --> IPC
    RustData --> IPC

lhjohn avatar Jul 03 '24 06:07 lhjohn