fmtm icon indicating copy to clipboard operation
fmtm copied to clipboard

Redesign project/task/history schemas and relationships to better align with event-driven programming

Open spwoodcock opened this issue 7 months ago • 9 comments

Problem

  • This is a suggestion made for Tasking Manager, but also applies here.
  • Credit to original post by qeef a community volunteer:
    • DB design: https://qeef.srht.site/dd-hot-tm/database-tasks-history.html
    • API design: https://qeef.srht.site/dd-hot-tm/api-endpoints.html
    • Code: https://git.sr.ht/~qeef/dd-hot-tm/tree

The key unit of change in tasking managers is the task status / events.

Currently we have:

  • Duplicated info between tasks and tasks_history.
  • An inefficient design for the flow of task events.
  • There is a logical flow between the task statuses, so we can make use of this and make the code more event driven.

DB Redesign

Related to other issues in this milestone: https://github.com/hotosm/fmtm/milestone/39

task_events table

  • Rename task_history to task_events.
  • Update the schema to be something like:
CREATE TABLE public.task_events (
    event_id UUID,
    project_id integer,
    task_id integer,
    user_id integer,
    state TaskState,
    comment text,
    created_at timestamp without time zone DEFAULT timezone('utc'::text, now()),
    PRIMARY KEY (event_id),
    FOREIGN KEY (project_id) REFERENCES public.projects (project_id),
    FOREIGN KEY (project_id, task_id) REFERENCES public.tasks (project_id, task_id)
    FOREIGN KEY (user_id) REFERENCES public.users (user_id),
);
-- See details on TaskState enum below

Note I made the event_id a UUID so it is a universal ID, across multiple databases / environments, and is more compatible with other database systems. It should probably be generated in the Python code before creating the DB object?

Note 2: adding the project_id may seem redundant, as task_id is unique, but it allows for more effective indexing. Tasks are divided up by project.

Better indexing

-- Create a better composite index for public.tasks
CREATE INDEX tasks_composite ON public.tasks USING btree (task_id, project_id);

-- Existing indexes from task_history
CREATE INDEX idx_task_event_composite ON public.task_events USING btree (task_id, project_id);
CREATE INDEX idx_task_event_project_id_user_id ON public.task_events USING btree (user_id, project_id);
CREATE INDEX idx_task_event_project_id ON public.task_events USING btree (project_id);
CREATE INDEX idx_task_event_user_id ON public.task_events USING btree (user_id);

-- Add additional index for event date
CREATE INDEX idx_task_event_created_at ON task_events USING btree (created_at);

Refactor enums

  • Remove TaskStatus enum as described in https://github.com/hotosm/fmtm/issues/1604 and refactor all usage.
  • Also remove TaskAction in it's current implementation.
  • Add two enums:

Events via the API:

class EventType(str, Enum):
    """Events that can be used via the API to update a state

    Specify the event type for ``POST`` to:
    ``/project/{pid}/event``
    or for Entities endpoints too.

    Possible values are:

    - ``map`` -- Set to *locked for mapping*, i.e. mapping in progress.
    - ``finish`` -- Set to *unlocked to validate*, i.e. is mapped.
    - ``validate`` -- Request recent task ready to be validate.
    - ``good`` -- Set the state to *unlocked done*.
    - ``bad`` -- Set the state *unlocked to map* again, to be mapped once again.
    - ``split`` -- Set the state *unlocked done* then generate additional subdivided task areas.
    - ``assign`` -- For a requester user to assign a task to another user. Set the state *locked for mapping* passing in the required user id.
    - ``comment`` -- Keep the state the same, but simply add a comment.

    Note that ``task_id`` must be specified in the endpoint too.
    """
    MAP = "map"
    FINISH = "finish"
    VALIDATE= "validate"
    GOOD = "good"
    BAD = "bad"
    SPLIT = "split"
    ASSIGN= "assign"
    COMMENT = "comment"

and the available states of an object in the db:

class State(int, Enum):
    """The state of an object (task or entity).

    The state can be:
    - ``unlocked to map``
    - ``locked for mapping``
    - ``unlocked to validate``
    - ``locked for validation``
    - ``unlocked done``
    """
    UNLOCKED_TO_MAP = 0
    LOCKED_FOR_MAPPING = 1
    UNLOCKED_TO_VALIDATE = 2
    LOCKED_FOR_VALIDATION = 3
    UNLOCKED_DONE = 4
  • task_events.state should use the State enum + we could probably also reuse this for the Entities status field.

Updating state via the API

Models

  • Pydantic model for POST update to state:
class NewEvent(BaseModel):
    """DTO for ``POST`` request to ``/project/{pid}/events`` endpoint.

    task_id must be set, as this always relates to a task.
    The EventType enum provides available options.
    """
    event: EventType
    task_id: conint(ge=1)
    requested_user_id: Optional[conint(ge=1)]
  • Replace tasks.task_status: TaskStatus with tasks.state: State.
  • Refactor TaskHistory model to TaskState:
class TaskState(BaseModel):
    """DTO for ``GET`` request to ``/project/{pid}/task/history``.

    And also for ``/project/{pid}/task/{tid}/history``.
    """
    task_id: conint(ge=1)
    user_id: conint(ge=1)
    profile_img: str
    state: State
    comment: str
    date: datetime
  • The Task model that is nested as a list of tasks under the Project model could be refactored too.
    • Remove locked_by etc fields, remove task_status.
    • Provide a key history_recent, which includes the most recent 10 TaskState objects.
    • We can gather the rest of the info we need from this (i.e. display history in sidebar with comments, show who mapped / locked, etc)

Endpoints

Note if using a local-first paradigm, this logic could be implemented on the frontend instead.

Endpoint to update task state via events:

# modify this to use case/when!

@app.post("/project/{project_id}/event")
async def new_event(project_id: conint(ge=1), user_id=Depends(login_required), detail: NewEvent):
    if detail.event == EventType.MAP:
        updated = await _db_api().lock_for_mapping(project_id, user_id, detail.task_id)
    elif detail.event == EventType.FINISH:
        updated = await _db_api().finish(project_id, user_id, detail.task_id)
    elif detail.event == EventType.VALIDATE:
        updated = await _db_api().lock_for_validation(project_id, user_id, detail.task_id)
    elif detail.event == EventType.GOOD:
        updated = await _db_api().good(project_id, user_id, detail.task_id)
    elif detail.event == EventType.BAD:
        updated = await _db_api().bad(project_id, user_id, detail.task_id)
    elif detail.event == EventType.SPLIT:
        updated = await _db_api().split(project_id, user_id, detail.task_id)
    elif detail.event == EventType.ASSIGN:
        updated = await _db_api().assign(project_id, detail.requested_user_id, detail.task_id)
    elif detail.event == EventType.COMMENT:
        updated = await _db_api().comment(project_id, user_id, detail.task_id)
    if not updated:
        raise fastapi.HTTPException(
            status_code=400,
            detail=f"Failed to update task ({detail.task_id}) with event ({detail.event})")
    # 201 response as event is created
    return HTTPResponse(status=201)

Database logic

  • The _db_api() methods above should be as such:
async def finish(pid, uid, tid) -> None:
    r = await db.sq(
        """
        WITH last AS (
            SELECT *
            FROM public.task_events
            WHERE project_id=$1 AND task_id=$2
            ORDER BY event_id DESC
            LIMIT 1),
        locked AS (
            SELECT *
            FROM last
            WHERE user_id=$3 AND state='LOCKED_FOR_MAPPING'
        )
        INSERT INTO public.task_events(project_id, task_id, user_id, event, comment)
        SELECT $1, $2, $3, 'UNLOCKED_TO_VALIDATE', 'Note: Mapping finished',
        FROM last
        WHERE user_id=$3
        RETURNING project_id, task_id, user_id
        """,
        project_id,
        task_id,
        user_id)
    assert 1 == len(r)
    assert r[0]["project_id"] == pid
    assert r[0]["task_id"] == tid
    assert r[0]["user_id"] == uid
    return None
  • For an ASSIGN event the task state should move to locked for mapping, with the user_id set to the requested user id. The user should also be notified as part of the logic.

In this flow, the task_events are a centralised data store where each event is stored as a sequence of immutable records, also called a log. Each new event is appended to the end of the list. This is the single source of truth for the task state, which can be accessed from multiple places.

spwoodcock avatar Jun 28 '24 15:06 spwoodcock