Bug report/Feature request: Predictions for Features of Type "StringArray"
Describe the bug I wrote a recommender that produces a span prediction with a label (string). When I connect this recommender to a string feature, it works perfectly but when I connect it to a StringArray feature I get an error message and the recomendation is not added to cas.
To Reproduce
- In Inception, create a span layer with a string and a stringArray feature.
- Write a recommender, that makes a span prediction with a label included and make them accessable via a server.
- In Inception, add 2 recommenders (one for the string feature, one for the stringArray feature) and connect both to your recommender from step 2.
- Run both recommenders.
Expected behavior I would expect that the prediction for the StringArray feature is added to the document as a string. If one would accept the prediction in Inception, it should be added as one string to the array. Ideally there would be an option to predict several strings at once.
Error message
[2025-03-20 19:43:03,902] ERROR in app: Exception on /predlemmas/predict [POST] Traceback (most recent call last): File "/home/aya/.local/lib/python3.7/site-packages/flask/app.py", line 2529, in wsgi_app response = self.full_dispatch_request() File "/home/aya/.local/lib/python3.7/site-packages/flask/app.py", line 1825, in full_dispatch_request rv = self.handle_user_exception(e) File "/home/aya/.local/lib/python3.7/site-packages/flask/app.py", line 1823, in full_dispatch_request rv = self.dispatch_request() File "/home/aya/.local/lib/python3.7/site-packages/flask/app.py", line 1799, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) File "/home/aya/nlp/contextual-props-de/inception_external_recommender/ariadne/server.py", line 59, in _predict result = jsonify(document=req.cas.to_xmi()) File "/usr/local/lib/python3.7/dist-packages/dkpro_cassis-0.9.1-py3.7.egg/cassis/cas.py", line 650, in to_xmi return self._serialize(CasXmiSerializer(), path, pretty_print=pretty_print) File "/usr/local/lib/python3.7/dist-packages/dkpro_cassis-0.9.1-py3.7.egg/cassis/cas.py", line 694, in _serialize return serializer.serialize(None, self, **kwargs) File "/usr/local/lib/python3.7/dist-packages/dkpro_cassis-0.9.1-py3.7.egg/cassis/xmi.py", line 508, in serialize self._serialize_feature_structure(cas, root, fs) File "/usr/local/lib/python3.7/dist-packages/dkpro_cassis-0.9.1-py3.7.egg/cassis/xmi.py", line 631, in _serialize_feature_structure if value.elements is not None: # Compare to none as not to skip if elements is empty! AttributeError: 'str' object has no attribute 'elements'
Please complete the following information:
- inception-external-recommender: Metadata-Version: 2.1 Version: 0.1.0.dev0
- Inception: 34.1
- OS: Ubuntu 22.04
- Python: 3.7
Additional context It's my very first issue post on github and I'm generally a newby with python, git, everything,... If anything is missing or my post is in the wrong category or whatever, please be nice ;)
String features and string array features need to be handled differently. A recommender for one won't work for the other.
For a string feature, the label is just that - a string:
from typing import List
from cassis import Cas
from ariadne.classifier import Classifier
from ariadne.protocol import TrainingDocument
from collections import defaultdict
from ariadne.contrib.inception_util import create_span_prediction
class DemoStringFeatureRecommender(Classifier):
def fit(self, documents: List[TrainingDocument], layer: str, feature: str, project_id, user_id: str):
# Count how often each mention has been annotated with a given label
counts = defaultdict(lambda: defaultdict(int))
for document in documents:
cas = document.cas
for annotation in cas.select(layer):
mention = annotation.get_covered_text().lower()
label = annotation.get(feature)
if not mention or not label :
continue
counts[mention][label] += 1
# Create a new dictionary that contains only the label with the highest count for each mention
best_labels = {mention: max(candidate_counts, key=candidate_counts.get) if candidate_counts else ""
for mention, candidate_counts in counts.items()}
print(f'Best labels: {best_labels}')
self._save_model(user_id, best_labels)
def predict(self, cas: Cas, layer: str, feature: str, project_id: str, document_id: str, user_id: str):
model = self._load_model(user_id)
if model is None:
return
# For each token, check if any of the mentions in the model correspond to the text starting
# at that token and create a new annotation if they do
for token in cas.select("de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token"):
mention = token.get_covered_text().lower()
if mention in model:
label = model.get(mention)
suggestion = create_span_prediction(cas, layer, feature, token.begin, token.begin + len(mention), label)
cas.add_annotation(suggestion)
However, for a string array feature, the feature value is not just a string and not even a plain array of strings - it is a StringArray:
from typing import List
from cassis import Cas
from cassis.typesystem import TYPE_NAME_STRING_ARRAY
from ariadne.classifier import Classifier
from ariadne.protocol import TrainingDocument
from collections import defaultdict
from ariadne.contrib.inception_util import create_span_prediction
class DemoStringArrayFeatureRecommender(Classifier):
def fit(self, documents: List[TrainingDocument], layer: str, feature: str, project_id, user_id: str):
# Count how often each mention has been annotated with a given label
counts = defaultdict(lambda: defaultdict(int))
for document in documents:
cas = document.cas
for annotation in cas.select(layer):
mention = annotation.get_covered_text().lower()
labels = annotation.get(feature)
if not mention or not labels :
continue
for label in labels.elements:
counts[mention][label] += 1
# Create a new dictionary that contains only the label with the highest count for each mention
best_labels = {mention: max(candidate_counts, key=candidate_counts.get) if candidate_counts else ""
for mention, candidate_counts in counts.items()}
self._save_model(user_id, best_labels)
def predict(self, cas: Cas, layer: str, feature: str, project_id: str, document_id: str, user_id: str):
model = self._load_model(user_id)
if model is None:
return
# For each token, check if any of the mentions in the model correspond to the text starting
# at that token and create a new annotation if they do
StringArray = cas.typesystem.get_type(TYPE_NAME_STRING_ARRAY)
for token in cas.select("de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token"):
mention = token.get_covered_text().lower()
if mention in model:
label = model.get(mention)
labels = StringArray(elements=[label])
suggestion = create_span_prediction(cas, layer, feature, token.begin, token.begin + len(mention), labels)
cas.add_annotation(suggestion)
No more feedback, so I hope this question has been resolved.
There is how also an example how do deal with string arrays: https://github.com/inception-project/inception-external-recommender/blob/main/ariadne/demo/demo_string_array_feature.py