isa-api icon indicating copy to clipboard operation
isa-api copied to clipboard

Different Protocol Names In Study Sequence Cause An Error

Open ptth222 opened this issue 1 year ago • 0 comments

I initially modified a JSON example directly and found this issue, but I think showing it from the Tab side is clearer.

I modified the BII-I-1 Tab example so that the first culture has a different protocol than the rest. This validates and converts to JSON without issues. If I try to convert that JSON back to Tab though there is an issue caused by the different protocol.

Modified study and investigation files: s_BII-S-1.txt i_investigation.txt


isa_json = isatab2json.convert('C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/tab/BII-I-1_conversion_testing', use_new_parser=True)

with open('C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/BII-I-1_testing.json', 'w') as out_fp:
     json.dump(isa_json, out_fp, indent=2)

with open('C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/BII-I-1_testing.json') as file_pointer:
    json2isatab.convert(file_pointer, 'C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/BII-I-1_testing/', validate_first=False)


Traceback (most recent call last):

  File "C:\Users\Sparda\AppData\Local\Temp\ipykernel_5600\", line 5, in <cell line: 4>
    json2isatab.convert(file_pointer, 'C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/BII-I-1_testing/', validate_first=False)

  File "C:\Python310\lib\site-packages\isatools\convert\", line 49, in convert
    isatab.dump(isa_obj=isa_obj, output_path=path, i_file_name=i_file_name,

  File "C:\Python310\lib\site-packages\isatools\isatab\dump\", line 170, in dump
    write_study_table_files(investigation, output_path)

  File "C:\Python310\lib\site-packages\isatools\isatab\dump\", line 134, in write_study_table_files
    df_dict[olabel][-1] =

KeyError: 'Protocol REF.growth protocol 2'

I investigated the error and it seems to come from identifying process nodes by the protocol they execute instead of by their position like is done with sample nodes in the same section of code. I think I was able to fix it by simply changing the process node code to be like the sample node code.

New Code:

        sample_in_path_count = 0
        protocol_in_path_count = 0
        longest_path = _longest_path_and_attrs(paths, s_graph.indexes)
        for node_index in longest_path:
            node = s_graph.indexes[node_index]
            if isinstance(node, Source):
                olabel = "Source Name"
                columns += flatten(
                    map(lambda x: get_characteristic_columns(olabel, x),
                columns += flatten(
                    map(lambda x: get_comment_column(
                        olabel, x), node.comments))
            elif isinstance(node, Process):
                olabel = "Protocol REF.{}".format(protocol_in_path_count)
                protocol_in_path_count += 1
                if not in protnames.keys():
                    protnames[] = protrefcount
                    protrefcount += 1
                columns += flatten(map(lambda x: get_pv_columns(olabel, x),
                if is not None:
                    columns.append(olabel + ".Date")
                if node.performer is not None:
                    columns.append(olabel + ".Performer")
                columns += flatten(
                    map(lambda x: get_comment_column(
                        olabel, x), node.comments))

            elif isinstance(node, Sample):
                olabel = "Sample Name.{}".format(sample_in_path_count)
                sample_in_path_count += 1
                columns += flatten(
                    map(lambda x: get_characteristic_columns(olabel, x),
                columns += flatten(
                    map(lambda x: get_comment_column(
                        olabel, x), node.comments))
                columns += flatten(map(lambda x: get_fv_columns(olabel, x),

        omap = get_object_column_map(columns, columns)
        # load into dictionary
        df_dict = dict(map(lambda k: (k, []), flatten(omap)))

        for path_ in paths:
            for k in df_dict.keys():  # add a row per path

            sample_in_path_count = 0
            protocol_in_path_count = 0
            for node_index in path_:
                node = s_graph.indexes[node_index]
                if isinstance(node, Source):
                    olabel = "Source Name"
                    df_dict[olabel][-1] =
                    for c in node.characteristics:
                        category_label = c.category.term if isinstance(c.category.term, str) \
                            else c.category.term["annotationValue"]
                        clabel = "{0}.Characteristics[{1}]".format(
                            olabel, category_label)
                        write_value_columns(df_dict, clabel, c)
                    for co in node.comments:
                        colabel = "{0}.Comment[{1}]".format(olabel,
                        df_dict[colabel][-1] = co.value

                elif isinstance(node, Process):
                    olabel = "Protocol REF.{}".format(
                    df_dict[olabel][-1] =
                    for pv in node.parameter_values:
                        pvlabel = "{0}.Parameter Value[{1}]".format(
                            olabel, pv.category.parameter_name.term)
                        write_value_columns(df_dict, pvlabel, pv)
                    if is not None:
                        df_dict[olabel + ".Date"][-1] =
                    if node.performer is not None:
                        df_dict[olabel + ".Performer"][-1] = node.performer
                    for co in node.comments:
                        colabel = "{0}.Comment[{1}]".format(olabel,
                        df_dict[colabel][-1] = co.value

                elif isinstance(node, Sample):
                    olabel = "Sample Name.{}".format(sample_in_path_count)
                    sample_in_path_count += 1
                    df_dict[olabel][-1] =
                    for c in node.characteristics:
                        category_label = c.category.term if isinstance(c.category.term, str) \
                            else c.category.term["annotationValue"]
                        clabel = "{0}.Characteristics[{1}]".format(
                            olabel, category_label)
                        write_value_columns(df_dict, clabel, c)
                    for co in node.comments:
                        colabel = "{0}.Comment[{1}]".format(olabel,
                        df_dict[colabel][-1] = co.value
                    for fv in node.factor_values:
                        fvlabel = "{0}.Factor Value[{1}]".format(
                        write_value_columns(df_dict, fvlabel, fv)

This is approximately lines 64-167 in isatools\isatab\dump\ in the write_study_table_files function. The changed code no longer errors and the converted study Tab from the JSON looks correct to me.

ptth222 avatar Aug 28 '23 20:08 ptth222