isa-api
isa-api copied to clipboard
Multiple Data Files of The Same Type Will Only Have 1 Name in Assay Conversion
If you try to create 2 files of the same type in the same assay in a JSON to Tab conversion only the last file will appear as the name in both columns. For example, if you have a Raw Data File, 'data_file1' and 'data_file2', only 'data_file2' will appear in the 2 Raw Data File columns (assuming data_file2 is later in the process sequence).
Example to reproduce:
with open('C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/json/BII-I-1/BII-I-1.json', 'r') as jsonFile:
isa_example = json.load(jsonFile)
## Delete process sequence for transcriptome and replace it.
del isa_example["studies"][0]["assays"][2]["processSequence"]
protocol1 = {
"@id": "#protocol/protocol1",
"name": "protocol1",
}
protocol2 = {
"@id": "#protocol/protocol2",
"name": "protocol2",
}
protocol3 = {
"@id": "#protocol/protocol3",
"name": "protocol3",
}
isa_example["studies"][0]["protocols"].append(protocol1)
isa_example["studies"][0]["protocols"].append(protocol2)
isa_example["studies"][0]["protocols"].append(protocol3)
data_file1 = {
"@id": "#data/data_file1",
"name": "data_file1",
"type": "Raw Data File"
}
data_file2 = {
"@id": "#data/data_file2",
"name": "data_file2",
"type": "Raw Data File"
}
data_file3 = {
"@id": "#data/data_file3",
"name": "data_file3",
"type": "Raw Data File"
}
isa_example["studies"][0]["assays"][2]["dataFiles"].append(data_file1)
isa_example["studies"][0]["assays"][2]["dataFiles"].append(data_file2)
isa_example["studies"][0]["assays"][2]["dataFiles"].append(data_file3)
data_file4 = {
"@id": "#data/data_file4",
"name": "data_file4",
"type": "Raw Data File"
}
data_file5 = {
"@id": "#data/data_file5",
"name": "data_file5",
"type": "Raw Data File"
}
isa_example["studies"][0]["assays"][2]["dataFiles"].append(data_file4)
isa_example["studies"][0]["assays"][2]["dataFiles"].append(data_file5)
new_process = [{
"@id": "#process/protocol1",
"executesProtocol": {
"@id": "#protocol/protocol1"
},
"inputs": [
{'@id': '#sample/sample-C-0.07-aliquot1'}
],
"outputs": [
{
"@id": "#data/data_file1"
},
],
"nextProcess": {"@id": "#process/protocol2"}
},
{
"@id": "#process/protocol2",
"executesProtocol": {
"@id": "#protocol/protocol2"
},
"inputs": [
{'@id': "#data/data_file1"}
],
"outputs": [
{
"@id": "#data/data_file2"
},
],
"previousProcess": {"@id": "#process/protocol1"},
"nextProcess": {"@id": "#process/protocol3"}
},
{
"@id": "#process/protocol3",
"executesProtocol": {
"@id": "#protocol/protocol3"
},
"inputs": [
{'@id': "#data/data_file2"}
],
"outputs": [
{
"@id": "#data/data_file3"
},
],
"previousProcess": {"@id": "#process/protocol2"},
},
{
"@id": "#process/protocol1_1",
"executesProtocol": {
"@id": "#protocol/protocol1"
},
"inputs": [
{'@id': '#sample/sample-C-0.07-aliquot2'}
],
"outputs": [
{
"@id": "#data/data_file4"
},
],
"nextProcess": {"@id": "#process/protocol3_1"}
},
{
"@id": "#process/protocol3_1",
"executesProtocol": {
"@id": "#protocol/protocol3"
},
"inputs": [
{'@id': "#data/data_file4"}
],
"outputs": [
{
"@id": "#data/data_file5"
},
],
"previousProcess": {"@id": "#process/protocol1_1"},
}
]
isa_example["studies"][0]["assays"][2]["processSequence"] = new_process
with open('C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/BII-I-1_testing.json', 'w') as out_fp:
json.dump(isa_example, out_fp, indent=2)
with open('C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/BII-I-1_testing.json') as file_pointer:
json2isatab.convert(file_pointer, 'C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/BII-I-1_testing/', validate_first=False)
The above example modifies the "BII-I-1" example. I basically delete the transcriptome processSequence and replace it with a simpler one.
The issue appears to be in the isatools\isatab\dump\write.py file, in the write_assay_table_files function. It is similar to issue #500 where multiple data file type column names are not being tracked. I have adjusted the code so it will track the names and the file names appear as expected. I created a PR, #510.