dkpro-core icon indicating copy to clipboard operation
dkpro-core copied to clipboard

Add support for document-level key-value metadata

Open reckart opened this issue 8 years ago • 2 comments

Add support for document-level key-value metadata. I imagine something like this:

=== Variant 1

MetaDataEntry extends Annotation  {
  String: key
  String: value
}

// Simplest option only allowing String key-value pairs

=== Variant 2

// Option only allowing basic typed key-value pairs with values represented as strings
// The type would be set if the value is not a string - and it would be set e.g. to `int`, `bool`, etc.

MetaDataEntry extends Annotation  {
  String: key
  String: value
  String: type
}

=== Variant 3

// Rather have everything in one FS; either value or ref would be set, but not both
// If ref is set, then values would be retrieved from the linked FS (key-values again)

MetaDataEntry extends Annotation  {
  String: key
  String: value
  FeatureStructure: ref
  String: type
}

=== Variant 4

// Full support for all kinds of structures, even nested entries - basically "schemaless"

MetaDataEntry extends Annotation {
  String: key
}

PrimitiveMetaDataEntry extends MetaDataEntry  {
  String: value
  String: type
}

MetaDataEntryGroup extends MetaDataEntry  {
  MetaDataEntry[]: items
}

Instead of adding the MetaDataEntry to a view, adding it to a list of MetaDataEntry that could be created on DocumentMetaData:

DocumentMetaData extends DocumentAnnotation {
   // ... all the stuff we already have in DocumentMetaData ...
   MetaDataEntry[]: entries
}

Alternative to extending Annotation would be to extend TOP and then only adding it to DocumentMetaData and not to the CAS view directly. That would mean that the MetaDataEntry could not be retrieved via the annotation index / via offsets. But it is expected that the offsets would always cover the whole document anyway. This could be a problem and require special handling if the annotations are added before the text is materialized; the respective code would have to know that all the MetaDataEntry annotations would need to be updated to match the materialized text in the end. UIMA handles this automatically for us for the DocumentAnnotation.

reckart avatar Nov 02 '17 11:11 reckart

@jgrivolla

reckart avatar Nov 02 '17 11:11 reckart

At the moment, I kind of tend towards:

  • variant 4
  • not extending Annotation but TOP and referring to the metadata only via a field in DocumentMetaData

reckart avatar Nov 02 '17 11:11 reckart