forte icon indicating copy to clipboard operation
forte copied to clipboard

Interface to fetch entries in primitive types from `DataPack`

Open Pushkar-Bhuse opened this issue 3 years ago • 2 comments

This PR is the first step towards fixing #881

Description of changes

Current, when fetching entries from a DataPack or MultiPack using the get method, Forte converts data store entries into object form. We wanted a way for users to directly interact with DataStore entries. In this PR, we provide a modification to the get method of DataPack to be able to return an entry in its primitive form directly from DataStore without needing to be converted to an object.

Additionally, since DataStore entries are not very interpretable (since they are in a list format), this PR introduces a way to retain data store entries in their primitive form and also represent them in a more interpretable way by converting it to a dictionary. This happens by the transform_data_store_entry method in data_store.py. An example of this is as follows:

# Entry of type 'ft.onto.base_ontology.Sentence'
            data_store_entry = [
                171792711812874531962213686690228233530,
                'ft.onto.base_ontology.Sentence',
                0,
                164,
                0,
                '-',
                0,
                {},
                {},
                {}
            ]

            transformed_entry = pack.transform_data_store_entry(
                data_store_entry
            )

            # transformed_entry = {
            #   'begin': 0,
            #   'end': 164,
            #   'payload_idx': 0,
            #   'speaker': '-',
            #   'part_id': 0,
            #   'sentiment': {},
            #   'classification': {},
            #   'classifications': {},
            #   'tid': 171792711812874531962213686690228233530,
            #   'type': 'ft.onto.base_ontology.Sentence'}
            # }

Possible influences of this PR.

By allowing DataPack or MultiRack to fetch entries in their primitive form, users can interact with DataStore more easily.

Test Conducted

The working of the get method with the get_raw attribute set to True was tested in data_pack_test.py and multi_pack_test.py

Pushkar-Bhuse avatar Aug 24 '22 20:08 Pushkar-Bhuse

Codecov Report

Merging #900 (77f4483) into master (72e8bce) will increase coverage by 0.05%. The diff coverage is 92.59%.

@@            Coverage Diff             @@
##           master     #900      +/-   ##
==========================================
+ Coverage   80.87%   80.93%   +0.05%     
==========================================
  Files         253      253              
  Lines       19619    19677      +58     
==========================================
+ Hits        15867    15925      +58     
  Misses       3752     3752              
Impacted Files Coverage Δ
tests/forte/data/data_store_serialization_test.py 98.43% <ø> (ø)
tests/forte/data/data_store_test.py 95.58% <ø> (ø)
forte/data/multi_pack.py 83.01% <80.00%> (+0.82%) :arrow_up:
forte/data/data_pack.py 84.90% <86.36%> (-0.37%) :arrow_down:
forte/data/data_store.py 93.31% <95.23%> (+0.39%) :arrow_up:
forte/data/base_pack.py 76.75% <100.00%> (+0.07%) :arrow_up:
forte/data/ontology/top.py 78.16% <100.00%> (+0.05%) :arrow_up:
tests/forte/data/data_pack_test.py 98.98% <100.00%> (+0.13%) :arrow_up:
tests/forte/data/multi_pack_test.py 97.05% <100.00%> (+0.15%) :arrow_up:
... and 2 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Aug 24 '22 20:08 codecov[bot]

quick comment on the title, not "fetch entries directly from Data Store", but fetch primitive types from data pack. Data store is still invisible to users.

hunterhector avatar Aug 30 '22 19:08 hunterhector