pynwb
pynwb copied to clipboard
Unable to copy data containers from one NWB file to another already existing NWB file
Description
Hi,
I would like to copy some data containers from an NWB file to another already existing NWB file, open in r+ mode. But I'm unable to do that, the data containers from the first file don't get written in the second file.
Steps to Reproduce
The following minimal working example should reproduce the problem. I would expect the latest print statements to match but they don't.
from datetime import datetime
from dateutil.tz import tzlocal
from pynwb import NWBFile
from pynwb import TimeSeries
from pynwb import NWBHDF5IO
import numpy as np
from pynwb import get_manager
# Create the base data
manager = get_manager()
start_time = datetime(2017, 4, 3, 11, tzinfo=tzlocal())
create_date = datetime(2017, 4, 15, 12, tzinfo=tzlocal())
data = np.arange(1000).reshape((100, 10))
timestamps = np.arange(100.)
filename1 = "file1.nwb"
filename2 = "file2.nwb"
# Create the 1st file
nwbfile1 = NWBFile(
session_description="demonstrate external files",
identifier="NWBE1",
session_start_time=start_time,
file_create_date=create_date,
)
test_ts1 = TimeSeries(
name="test_timeseries1", data=data, unit="SIunit", timestamps=timestamps
)
nwbfile1.add_acquisition(test_ts1)
with NWBHDF5IO(filename1, "w") as io:
io.write(nwbfile1)
# Check what's inside the file
with NWBHDF5IO(filename1, "r") as io:
f1 = io.read()
print("1st file:")
print(f1)
# Create the 2nd file
nwbfile2 = NWBFile(
session_description="demonstrate external files",
identifier="NWBE2",
session_start_time=start_time,
file_create_date=create_date,
)
test_ts2 = TimeSeries(
name="test_timeseries2", data=data, unit="SIunit", timestamps=timestamps
)
nwbfile2.add_acquisition(test_ts2)
with NWBHDF5IO(filename2, "w") as io:
io.write(nwbfile2)
# Check what's inside the file
with NWBHDF5IO(filename2, "r") as io:
f2 = io.read()
print("2nd file:")
print(f2)
# Get the first container
with NWBHDF5IO(filename1, "r", manager=manager) as io1:
nwbfile1 = io1.read()
timeseries_1 = nwbfile1.get_acquisition("test_timeseries1")
# Add it to the 2nd file
with NWBHDF5IO(filename2, "r+", manager=manager) as io2:
nwbfile2 = io2.read()
nwbfile2.add_acquisition(timeseries_1)
print("What I should see in the 2nd file:")
print(nwbfile2)
io2.write(nwbfile2, link_data=False)
# Check what's inside the 2nd file
with NWBHDF5IO(filename2, "r") as io:
f2 = io.read()
print("What really is in the 2nd file:")
print(f2)
Environment
Python Executable: Conda
Python Version: Python 3.7.6
Operating System: Linux
HDMF Version: 2.2.0
Version of PyNWB used: 1.4.0
Checklist
- [X] Have you ensured the feature or change was not already reported?
- [X] Have you included a brief and descriptive title?
- [X] Have you included a clear description of the problem you are trying to solve?
- [X] Have you included a minimal code snippet that reproduces the issue you are encountering?
- [X] Have you checked our Contributing document?
As a workaround, it works if I add the data containers to a new file. Indeed the latest print statements in the following minimal working example match.
from datetime import datetime
from dateutil.tz import tzlocal
from pynwb import NWBFile
from pynwb import TimeSeries
from pynwb import NWBHDF5IO
import numpy as np
from pynwb import get_manager
# Create the base data
manager = get_manager()
start_time = datetime(2017, 4, 3, 11, tzinfo=tzlocal())
create_date = datetime(2017, 4, 15, 12, tzinfo=tzlocal())
data = np.arange(1000).reshape((100, 10))
timestamps = np.arange(100.)
filename1 = "file1.nwb"
filename2 = "file2.nwb"
filename3 = "file3.nwb"
# Create the 1st file
nwbfile1 = NWBFile(
session_description="demonstrate external files",
identifier="NWBE1",
session_start_time=start_time,
file_create_date=create_date,
)
test_ts1 = TimeSeries(
name="test_timeseries1", data=data, unit="SIunit", timestamps=timestamps
)
nwbfile1.add_acquisition(test_ts1)
with NWBHDF5IO(filename1, "w") as io:
io.write(nwbfile1)
# Check what's inside the file
with NWBHDF5IO(filename1, "r") as io:
f1 = io.read()
print("1st file:")
print(f1)
# Create the 2nd file
nwbfile2 = NWBFile(
session_description="demonstrate external files",
identifier="NWBE2",
session_start_time=start_time,
file_create_date=create_date,
)
test_ts2 = TimeSeries(
name="test_timeseries2", data=data, unit="SIunit", timestamps=timestamps
)
nwbfile2.add_acquisition(test_ts2)
with NWBHDF5IO(filename2, "w") as io:
io.write(nwbfile2)
# Check what's inside the file
with NWBHDF5IO(filename2, "r") as io:
f2 = io.read()
print("2nd file:")
print(f2)
# Get the 1st container
with NWBHDF5IO(filename1, "r", manager=manager) as io1:
nwbfile1 = io1.read()
timeseries_1 = nwbfile1.get_acquisition("test_timeseries1")
with NWBHDF5IO(filename2, "r", manager=manager) as io2:
nwbfile2 = io2.read()
timeseries_2 = nwbfile2.get_acquisition("test_timeseries2")
# Combine the containers in the 3rd file
with NWBHDF5IO(filename3, "w", manager=manager) as io3:
nwbfile3 = NWBFile(
session_description="demonstrate external files",
identifier="NWBE3",
session_start_time=start_time,
file_create_date=create_date,
)
nwbfile3.add_acquisition(timeseries_1)
nwbfile3.add_acquisition(timeseries_2)
io3.write(nwbfile3, link_data=False)
print("What I should see in the 3rd file:")
print(nwbfile3)
# Check what's inside the 3rd file
with NWBHDF5IO(filename3, "r") as io:
f3 = io.read()
print("What really is in the 3rd file:")
print(f3)
@kiroul We are doing some last minute preparations for the NWB workshop, so unfortunately, I do not have time right not to look at this issue in detail. I plan to get to this sometime next week. I'm glad you were able to find a workaround for the meantime.
Sure, thanks for letting me know :slightly_smiling_face:
Any news on this issue? In the end the workaround didn't work on the real code, the data containers don't get written in the output NWB file :slightly_frowning_face:
@kir0ul Copying containers/groups is not yet supported by PyNWB, but copying datasets is. That is what the link_data=False flag allows. Here is another workaround based on that:
# Add it to the 2nd file
with NWBHDF5IO(filename2, "r+", manager=manager) as io2:
nwbfile2 = io2.read()
timeseries_1_copy = TimeSeries(
name=timeseries_1.name,
data=timeseries_1.data,
unit=timeseries_1.unit,
timestamps=timeseries_1.timestamps
)
nwbfile2.add_acquisition(timeseries_1_copy)
It looks like this is not an issue anymore with latest versions.
Sorry, I got confused with the title that says containers, but the code examples are about copying datasets, which can actually be copied from one file to the other. Is it still the case that copying containers is not supported?