PatientLevelPrediction
PatientLevelPrediction copied to clipboard
Code base refactor - Data class
This issue should be for discussing the requirements of a data class. Here is a preliminary list of requirements:
Functional Requirements:
- Data Backend Support: Should support multiple data backends for holding and providing data.
- Interprocess Communication (IPC) Support: Should be able to communicate with external processes (e.g. other programming languages) using different IPC mechanisms.
- Data Persistence: Should provide functionality to load and save data using various methods.
- Synthetic Data Handling: Should support the generation of synthetic data, currently used primarily for unit tests and prototyping.
Technical Requirements
- Should support Arrow, Andromeda, DataTable as data backends.
- Should support Andromeda, Arrow Flight, Arrow Feather for IPC.
Possible architecture:
classDiagram
class Data {
-.data
-.meta_data
-.backend: DataBackend
+data()
+metadata()
+setBackend(backend: DataBackend)
+load(file_path) : uses
+save(file_path) : uses
+to_ipc(ipc: IPC) : uses
+from_ipc(ipc: IPC) : uses
}
class DataBackend {
<<abstract>>
}
class DataBackend-Arrow
class DataBackend-Andromeda
class DataBackend-DataTable
Data --> DataBackend
DataBackend <|-- DataBackend-Arrow
DataBackend <|-- DataBackend-Andromeda
DataBackend <|-- DataBackend-DataTable
class DataPersistence {
<<abstract>>
+load()
+save()
}
class DataPersistence-LocalFile {
+load(file_path)
+save(.data, file_path)
}
class DataPersistence-Database {
+load(connection_details)
}
class DataPersistence-Synthetic {
+load()
}
Data --> DataPersistence
DataPersistence <|-- DataPersistence-Database
DataPersistence <|-- DataPersistence-LocalFile
DataPersistence <|-- DataPersistence-Synthetic
class IPC {
<<abstract>>
+doPut()
+doGet()
+doExchange()
}
class IPC-ArrowFlight
class IPC-ArrowFeather
class IPC-Andromeda
IPC <|-- IPC-ArrowFlight
IPC <|-- IPC-ArrowFeather
IPC <|-- IPC-Andromeda
Data --> IPC
class PyData {
+to_ipc(ipc: IPC)
+from_ipc(ipc: IPC)
}
class RustData {
+to_ipc(ipc: IPC)
+from_ipc(ipc: IPC)
}
PyData --> IPC
RustData --> IPC