doccano-client icon indicating copy to clipboard operation
doccano-client copied to clipboard

How to add new annotation in sequence labelling project?

Open lexiconlp opened this issue 4 years ago • 8 comments

I saw there is a function call add_annotation but this can't be use it for sequence labelling projects (NER).

Is there any way to update the labels (NER) on a document ? (Add and Delete labels)

lexiconlp avatar Nov 24 '20 07:11 lexiconlp

Issue-Label Bot is automatically applying the label question to this issue, with a confidence of 0.92. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

issue-label-bot[bot] avatar Nov 24 '20 07:11 issue-label-bot[bot]

It seems to me that there is no feature to annotate the sequence labeling project. add_annotation only supports document classification:

https://github.com/doccano/doccano-client/blob/3a2af1162db0a2c2bfecd837ded74c98e07e72b8/doccano_api_client/init.py#L252-L268

Hironsan avatar Nov 24 '20 08:11 Hironsan

It seems to me that there is no feature to annotate the sequence labeling project. add_annotation only supports document classification:

https://github.com/doccano/doccano-client/blob/3a2af1162db0a2c2bfecd837ded74c98e07e72b8/doccano_api_client/init.py#L252-L268

Until they add support for this in their code, you can add the missing kwargs to the annotation method, so it supports sequence labelling annotations. You can overwrite their client's add_annotation method like this:

from doccano_api_client import DoccanoClient, requests

class DoccanoClientMod(DoccanoClient):

    def add_annotation(
            self,
            project_id: int,
            annotation_id: int,
            document_id: int,
            **kwargs
    ) -> requests.models.Response:
        """
        Adds an annotation to a given document.
        """
        url = '/v1/projects/{p_id}/docs/{d_id}/annotations'.format(
            p_id=project_id,
            d_id=document_id)
        payload = {
            "label": annotation_id,
            "projectId": project_id,
            **kwargs  # we could add this in a PR to project
        }
        return self.post(url, json=payload)

and then call it with start_offset and end_offset. For example:

    # connect to the client with the overwritten method
    doccano_client = DoccanoClientMod('http://127.0.0.1:8000/', 'admin', 'password')

    # Create a project
    create_resp = doccano_client.create_project(
        name="hi",
        project_type="SequenceLabeling",
        resourcetype="SequenceLabelingProject")
    project_id = create_resp['id']

    # Add labels to project
    l1_resp = doccano_client.create_label(project_id, text='a')
    l2_resp = doccano_client.create_label(project_id, text='b')

    # Create a document with annotations
    d1_resp = doccano_client.create_document(project_id, text="whatever foo bar")
    doccano_client.add_annotation(project_id, l1_resp['id'], d1_resp['id'], start_offset=0, end_offset=9)

guigarfr avatar Dec 16 '20 16:12 guigarfr

@guigarfr any ETA on a new release allowing for kwargs in add_annotation?

another note for this method: the return type is not requests.models.Response, despite README and type hints claiming this. It's the case for more methods (all methods that use self.get or self.post, as they both call a .json()).

I encountered this when I tried to call a .raise_for_status() on self.get_me or on self.add_annotation.

ddelange avatar Jan 08 '21 09:01 ddelange

another two methods I missed in the client related to sequence labelling and active learning:

def clear_annotations(
    self,
    project_id: int,
    document_id: int,
) -> requests.models.Response:
    """Clear all annotations for doc."""
    url = "v1/projects/{}/docs/{}/annotations".format(project_id, document_id)
    return self.delete(url)

def delete_annotation(
    self,
    project_id: int,
    document_id: int,
    annotation_id: int,
) -> requests.models.Response:
    """Delete single annotation for doc."""
    url = "v1/projects/{}/docs/{}/annotations/{}".format(
        project_id, document_id, annotation_id
    )
    return self.delete(url)

ddelange avatar Jan 08 '21 09:01 ddelange

Any update on this feature? @guigarfr

robinsonkwame avatar Aug 18 '22 01:08 robinsonkwame

maybe this get_client wrapper can help you in the meantime. it overwrites add_annotation: https://gist.github.com/ddelange/35d219991d60606dfff034a3985bb0d1#file-doccano_active_learning-py-L155

ddelange avatar Aug 18 '22 07:08 ddelange

i'm currently focused in other stuff. right now I can't help you. i guess you can PR a solution and let someone working in the project review it.

in my PR i just added kwargs to a method...

guigarfr avatar Aug 18 '22 09:08 guigarfr