python-odml icon indicating copy to clipboard operation
python-odml copied to clipboard

Cumbersome usage of terminology

Open lzehl opened this issue 7 years ago • 1 comments

While preparing the new documentation I stumbled upon the functionality of terminology for Document and Section objects. Let me demonstrate the cumbersome aspects of the repository functionality for integrating terminologies via the following example (performed in odml v1.2):

import odml

# define the G-Node repo
repo_doc = 'http://portal.g-node.org/odml/terminologies/v1.0/terminologies.xml'

# doc1 should be the Document I would like to create by using terminologies of the G-Node repo
doc1 = odml.Document('WantedDoc', repository=repo_doc)

# doc1 is now an empty Document with the G-Node repository
print(doc1)
# output: None by WantedDoc (0 sections)>

# to actually use the terminologies provided via the repository,
# to include an Event section with the provided terminology
# I now have to generate secondary objects

# A: for example, via another Document
doc2 = doc1.get_terminology_equivalent()
# output: Document containing all Sections that are provided as terminologies via the G-Node repo
# <Doc None by None (54 sections)>
# (note that this will currently give a lot of warnings, 
#  since the terminologies are not yet updated to the recent changes of odML)
doc1.append(odml.Section('MyEvent1', type='whatever'))
doc1.sections['MyEvent1'].merge(doc2.sections['Event'])

# B: via creating and appending a Section which has a type equivalent terminology in the G-Node repo
doc1.append(odml.Section('MyEvent2', type='event'))
ev_sec_a = doc1.sections['MyEvent2'].get_terminology_equivalent()
# output: first Section (with children) occurring in the term. of the G-Node repo with Section type 'event'
# <Section Event[event] (0)>
doc1.sections['MyEvent2'].merge(ev_sec_a)

# NOTE: 
# that for A the Section type is irrelevant, because the user actively selects a Section 
# from the complete terminology tree of the given repo
# for B, the terminology of the wanted Section is AUTOMATICALLY extracted from the repo
# by matching the Section types

# B also works for a standalone Section, but defining a Section specific repo is only indirectly possible:
repo_sec = 'http://portal.g-node.org/odml/terminologies/v1.0/event/event.xml'
# my_ev_sec = odml.Section('MyEvent3', type='event', repository=repo_sec) 
# -> will give a TypeError: __init__() got an unexpected keyword argument 'repository'
# but the following is possible:
my_ev_sec = odml.Section('MyEvent3', type='event')
my_ev_sec.repository = repo_sec
# and then follow B to use the repository functionality 
ev_sec_b = my_ev_sec.get_terminology_equivalent()
doc1.append(ev_sec_b.clone())
# will append a clone of the Section with the extracted terminology <Section Event[event] (0)> to doc1

lzehl avatar Mar 03 '17 13:03 lzehl

One more issue: Currently the terminology equivalent of a Section is found in the given repository by matching the Section type.

This is problematic for the following reason: A Section type, as I understood it so far, is meant to group related Sections to an superior class. For example, two different Electrodes from different vendors with different Properties can be grouped via their type which could be 'electrode', even if the Section names are vendor-specific. In such a case, the function get_terminology_equivalent() will currently unfortunately only return the first Section with the type 'electrode' and ignore the second Section in the repository.

Of course one could provide always a unique Section type to avoid this, but I think then one could directly use the Section Name instead and the possibility an overall classification method for Sections is lost.

Maybe one could implement this better for the suggested functions (see previous comment):

While merge_terminology_equivalent(), would always match by uniquely by name, the get_terminology(repository, section_name=None, section_type=None) function would work via the:

  • 'repository' that provides the url to the general repository (e.g., the G-Node terminology repo)
  • 'section_name' that, if defined, would specify which terminology should be returned by matching the unique Section name
  • 'section_type' that, if defined, would specify the list of terminologies that should be returned by matching the possibly multiple occurring Section type

For such a change it would be important to update the G-Node terminology repository and separate the terminology of individual Sections from example template structures for metadata of common hard- or software (cf. http://portal.g-node.org/odml/terminologies/v1.0/blackrock/blackrock.xml). This way one could fix that the terminology repository only contains Sections with unique Section Names.

OR we even need to find a third solution...

lzehl avatar Mar 05 '17 14:03 lzehl