isa-api icon indicating copy to clipboard operation
isa-api copied to clipboard

Multiple Data Files of The Same Type Will Only Have 1 Name in Assay Conversion

Open ptth222 opened this issue 8 months ago • 2 comments

If you try to create 2 files of the same type in the same assay in a JSON to Tab conversion only the last file will appear as the name in both columns. For example, if you have a Raw Data File, 'data_file1' and 'data_file2', only 'data_file2' will appear in the 2 Raw Data File columns (assuming data_file2 is later in the process sequence).

Example to reproduce:

with open('C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/json/BII-I-1/BII-I-1.json', 'r') as jsonFile:
    isa_example = json.load(jsonFile)
    
## Delete process sequence for transcriptome and replace it.
del isa_example["studies"][0]["assays"][2]["processSequence"]

protocol1 = {
          "@id": "#protocol/protocol1",
          "name": "protocol1",
        }
protocol2 = {
          "@id": "#protocol/protocol2",
          "name": "protocol2",
        }
protocol3 = {
          "@id": "#protocol/protocol3",
          "name": "protocol3",
        }
isa_example["studies"][0]["protocols"].append(protocol1)
isa_example["studies"][0]["protocols"].append(protocol2)
isa_example["studies"][0]["protocols"].append(protocol3)


data_file1 = {
          "@id": "#data/data_file1",
          "name": "data_file1",
          "type": "Raw Data File"
        }
data_file2 = {
          "@id": "#data/data_file2",
          "name": "data_file2",
          "type": "Raw Data File"
        }
data_file3 = {
          "@id": "#data/data_file3",
          "name": "data_file3",
          "type": "Raw Data File"
        }
isa_example["studies"][0]["assays"][2]["dataFiles"].append(data_file1)
isa_example["studies"][0]["assays"][2]["dataFiles"].append(data_file2)
isa_example["studies"][0]["assays"][2]["dataFiles"].append(data_file3)


data_file4 = {
          "@id": "#data/data_file4",
          "name": "data_file4",
          "type": "Raw Data File"
        }
data_file5 = {
          "@id": "#data/data_file5",
          "name": "data_file5",
          "type": "Raw Data File"
        }
isa_example["studies"][0]["assays"][2]["dataFiles"].append(data_file4)
isa_example["studies"][0]["assays"][2]["dataFiles"].append(data_file5)


new_process = [{
          "@id": "#process/protocol1",
          "executesProtocol": {
            "@id": "#protocol/protocol1"
          },
          "inputs": [
              {'@id': '#sample/sample-C-0.07-aliquot1'}
              ],
          "outputs": [
            {
              "@id": "#data/data_file1"
            },
          ],
          "nextProcess": {"@id": "#process/protocol2"}
        },
    {
          "@id": "#process/protocol2",
          "executesProtocol": {
            "@id": "#protocol/protocol2"
          },
          "inputs": [
              {'@id': "#data/data_file1"}
              ],
          "outputs": [
            {
              "@id": "#data/data_file2"
            },
          ],
          "previousProcess": {"@id": "#process/protocol1"},
          "nextProcess": {"@id": "#process/protocol3"}
        },
    {
          "@id": "#process/protocol3",
          "executesProtocol": {
            "@id": "#protocol/protocol3"
          },
          "inputs": [
              {'@id': "#data/data_file2"}
              ],
          "outputs": [
            {
              "@id": "#data/data_file3"
            },
          ],
          "previousProcess": {"@id": "#process/protocol2"},
        },
    
    
    {
              "@id": "#process/protocol1_1",
              "executesProtocol": {
                "@id": "#protocol/protocol1"
              },
              "inputs": [
                  {'@id': '#sample/sample-C-0.07-aliquot2'}
                  ],
              "outputs": [
                {
                  "@id": "#data/data_file4"
                },
              ],
              "nextProcess": {"@id": "#process/protocol3_1"}
            },
        {
              "@id": "#process/protocol3_1",
              "executesProtocol": {
                "@id": "#protocol/protocol3"
              },
              "inputs": [
                  {'@id': "#data/data_file4"}
                  ],
              "outputs": [
                {
                  "@id": "#data/data_file5"
                },
              ],
              "previousProcess": {"@id": "#process/protocol1_1"},
            }
    
    ]
isa_example["studies"][0]["assays"][2]["processSequence"] = new_process


with open('C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/BII-I-1_testing.json', 'w') as out_fp:
     json.dump(isa_example, out_fp, indent=2)

with open('C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/BII-I-1_testing.json') as file_pointer:
    json2isatab.convert(file_pointer, 'C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/BII-I-1_testing/', validate_first=False)

The above example modifies the "BII-I-1" example. I basically delete the transcriptome processSequence and replace it with a simpler one.

The issue appears to be in the isatools\isatab\dump\write.py file, in the write_assay_table_files function. It is similar to issue #500 where multiple data file type column names are not being tracked. I have adjusted the code so it will track the names and the file names appear as expected. I created a PR, #510.

ptth222 avatar Nov 01 '23 16:11 ptth222