doccano-client
doccano-client copied to clipboard
How to add new annotation in sequence labelling project?
I saw there is a function call add_annotation but this can't be use it for sequence labelling projects (NER).
Is there any way to update the labels (NER) on a document ? (Add and Delete labels)
Issue-Label Bot is automatically applying the label question to this issue, with a confidence of 0.92. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!
Links: app homepage, dashboard and code for this bot.
It seems to me that there is no feature to annotate the sequence labeling project. add_annotation only supports document classification:
https://github.com/doccano/doccano-client/blob/3a2af1162db0a2c2bfecd837ded74c98e07e72b8/doccano_api_client/init.py#L252-L268
It seems to me that there is no feature to annotate the sequence labeling project.
add_annotationonly supports document classification:https://github.com/doccano/doccano-client/blob/3a2af1162db0a2c2bfecd837ded74c98e07e72b8/doccano_api_client/init.py#L252-L268
Until they add support for this in their code, you can add the missing kwargs to the annotation method, so it supports sequence labelling annotations. You can overwrite their client's add_annotation method like this:
from doccano_api_client import DoccanoClient, requests
class DoccanoClientMod(DoccanoClient):
def add_annotation(
self,
project_id: int,
annotation_id: int,
document_id: int,
**kwargs
) -> requests.models.Response:
"""
Adds an annotation to a given document.
"""
url = '/v1/projects/{p_id}/docs/{d_id}/annotations'.format(
p_id=project_id,
d_id=document_id)
payload = {
"label": annotation_id,
"projectId": project_id,
**kwargs # we could add this in a PR to project
}
return self.post(url, json=payload)
and then call it with start_offset and end_offset. For example:
# connect to the client with the overwritten method
doccano_client = DoccanoClientMod('http://127.0.0.1:8000/', 'admin', 'password')
# Create a project
create_resp = doccano_client.create_project(
name="hi",
project_type="SequenceLabeling",
resourcetype="SequenceLabelingProject")
project_id = create_resp['id']
# Add labels to project
l1_resp = doccano_client.create_label(project_id, text='a')
l2_resp = doccano_client.create_label(project_id, text='b')
# Create a document with annotations
d1_resp = doccano_client.create_document(project_id, text="whatever foo bar")
doccano_client.add_annotation(project_id, l1_resp['id'], d1_resp['id'], start_offset=0, end_offset=9)
@guigarfr any ETA on a new release allowing for kwargs in add_annotation?
another note for this method: the return type is not requests.models.Response, despite README and type hints claiming this. It's the case for more methods (all methods that use self.get or self.post, as they both call a .json()).
I encountered this when I tried to call a .raise_for_status() on self.get_me or on self.add_annotation.
another two methods I missed in the client related to sequence labelling and active learning:
def clear_annotations(
self,
project_id: int,
document_id: int,
) -> requests.models.Response:
"""Clear all annotations for doc."""
url = "v1/projects/{}/docs/{}/annotations".format(project_id, document_id)
return self.delete(url)
def delete_annotation(
self,
project_id: int,
document_id: int,
annotation_id: int,
) -> requests.models.Response:
"""Delete single annotation for doc."""
url = "v1/projects/{}/docs/{}/annotations/{}".format(
project_id, document_id, annotation_id
)
return self.delete(url)
Any update on this feature? @guigarfr
maybe this get_client wrapper can help you in the meantime. it overwrites add_annotation: https://gist.github.com/ddelange/35d219991d60606dfff034a3985bb0d1#file-doccano_active_learning-py-L155
i'm currently focused in other stuff. right now I can't help you. i guess you can PR a solution and let someone working in the project review it.
in my PR i just added kwargs to a method...