aeneas icon indicating copy to clipboard operation
aeneas copied to clipboard

Feature Request: Add a type field in the mplain json output

Open johnking opened this issue 7 years ago • 4 comments

Hi @readbeyond

To get the type (Paragraph, Sentence or word) from the syncmap JSON data based on multi-plain text, we have to parse the id field such as "p000014s000001w000002".

It would be nice to have one more field -type into the JSON data to avoid such post-processing.

If it does not make sense to this repository, may you please give me some hints to modify the code by myself?

thanks a lot

-John

johnking avatar Jul 01 '17 15:07 johnking

Hi,

you raise a good point. In fact, I am not really satisfied by how multi-level formats are supported right now. On one hand, e.g. the "compute engine" is generic enough to support an arbitrary number of levels, but global parameters/command line options support 3 and 3 only levels. Moreover, this support is done through replication of keys/names/variables.

For all these reasons, I will probably rework the relevant code for multi-level formats in aeneas v2, and while doing that I will address your issue directly.

Unfortunately, this plan also means that I am not going to address your issue in the 1.x series, so you need to either process the id after the JSON file has been produced, or patch your local version of aeneas. In the latter case, you might want to modify

def format(self, syncmap)

in

https://github.com/readbeyond/aeneas/blob/master/aeneas/syncmap/smfjson.py#L53

or, even better, the code of

@property def json_string(self)

in

https://github.com/readbeyond/aeneas/blob/master/aeneas/syncmap/init.py#L248

(you need to keep track of the level in the recursive visit, and add the suitable "type": "value" to the dictionary which is appended in line 262)

HTH,

Alberto Pettarin

On 07/01/2017 05:44 PM, johnking wrote:

Hi @readbeyond https://github.com/readbeyond

To get the type (Paragraph, Sentence or word) from the syncmap JSON data based on multi-plain text, we have to parse the |id| field such as "p000014s000001w000002".

It would be nice to have one more field -|type| into the JSON data to avoid such post-processing.

If it does not make sense to this repository, may you please give me some hints to modify the code by myself?

thanks a lot

-John

readbeyond avatar Jul 01 '17 16:07 readbeyond

@readbeyond , Hi Alberto,

Thanks for your reply and sharing us the roadmap, looking forward to V2.0!

thanks again!

-John

johnking avatar Jul 02 '17 01:07 johnking

@johnking hi, you might want something like this:


    @property
    def json_string(self):
        """
        Return a JSON representation of the sync map.
        :rtype: string
        .. versionadded:: 1.3.1
        """
        def visit_children(node, level):
            """ Recursively visit the fragments_tree """
            output_fragments = []
            for child in node.children_not_empty:
                fragment = child.value
                text = fragment.text_fragment
                output_fragments.append({
                    "id": text.identifier,
                    "language": text.language,
                    "lines": text.lines,
                    "begin": gf.time_to_ssmmm(fragment.begin),
                    "end": gf.time_to_ssmmm(fragment.end),
                    "children": visit_children(child, level + 1),
                    "type": level
                })
            return output_fragments
        output_fragments = visit_children(self.fragments_tree, 0)
        return gf.safe_unicode(
            json.dumps({"fragments": output_fragments}, indent=1, sort_keys=True)
)

pettarin avatar Jul 02 '17 11:07 pettarin

@readbeyond , Hi Alberto, Thanks for your sharing, really appreciate it.

I am developing an App and want to reuse/expand the JSON structure, I will share my idea once I finish the prototype.

thanks again.

-John

johnking avatar Jul 04 '17 13:07 johnking