pynwb icon indicating copy to clipboard operation
pynwb copied to clipboard

Unable to copy data containers from one NWB file to another already existing NWB file

Open kir0ul opened this issue 5 years ago • 7 comments

Description

Hi,

I would like to copy some data containers from an NWB file to another already existing NWB file, open in r+ mode. But I'm unable to do that, the data containers from the first file don't get written in the second file.

Steps to Reproduce

The following minimal working example should reproduce the problem. I would expect the latest print statements to match but they don't.

from datetime import datetime
from dateutil.tz import tzlocal
from pynwb import NWBFile
from pynwb import TimeSeries
from pynwb import NWBHDF5IO
import numpy as np
from pynwb import get_manager

# Create the base data
manager = get_manager()
start_time = datetime(2017, 4, 3, 11, tzinfo=tzlocal())
create_date = datetime(2017, 4, 15, 12, tzinfo=tzlocal())
data = np.arange(1000).reshape((100, 10))
timestamps = np.arange(100.)
filename1 = "file1.nwb"
filename2 = "file2.nwb"

# Create the 1st file
nwbfile1 = NWBFile(
    session_description="demonstrate external files",
    identifier="NWBE1",
    session_start_time=start_time,
    file_create_date=create_date,
)
test_ts1 = TimeSeries(
    name="test_timeseries1", data=data, unit="SIunit", timestamps=timestamps
)
nwbfile1.add_acquisition(test_ts1)
with NWBHDF5IO(filename1, "w") as io:
    io.write(nwbfile1)

# Check what's inside the file
with NWBHDF5IO(filename1, "r") as io:
    f1 = io.read()
    print("1st file:")
    print(f1)

# Create the 2nd file
nwbfile2 = NWBFile(
    session_description="demonstrate external files",
    identifier="NWBE2",
    session_start_time=start_time,
    file_create_date=create_date,
)
test_ts2 = TimeSeries(
    name="test_timeseries2", data=data, unit="SIunit", timestamps=timestamps
)
nwbfile2.add_acquisition(test_ts2)
with NWBHDF5IO(filename2, "w") as io:
    io.write(nwbfile2)

# Check what's inside the file
with NWBHDF5IO(filename2, "r") as io:
    f2 = io.read()
    print("2nd file:")
    print(f2)

# Get the first container
with NWBHDF5IO(filename1, "r", manager=manager) as io1:
    nwbfile1 = io1.read()
    timeseries_1 = nwbfile1.get_acquisition("test_timeseries1")

    # Add it to the 2nd file
    with NWBHDF5IO(filename2, "r+", manager=manager) as io2:
        nwbfile2 = io2.read()
        nwbfile2.add_acquisition(timeseries_1)
        print("What I should see in the 2nd file:")
        print(nwbfile2)
        io2.write(nwbfile2, link_data=False)

# Check what's inside the 2nd file
with NWBHDF5IO(filename2, "r") as io:
    f2 = io.read()
    print("What really is in the 2nd file:")
    print(f2)

Environment

Python Executable: Conda 
Python Version: Python 3.7.6
Operating System: Linux
HDMF Version: 2.2.0
Version of PyNWB used: 1.4.0

Checklist

  • [X] Have you ensured the feature or change was not already reported?
  • [X] Have you included a brief and descriptive title?
  • [X] Have you included a clear description of the problem you are trying to solve?
  • [X] Have you included a minimal code snippet that reproduces the issue you are encountering?
  • [X] Have you checked our Contributing document?

kir0ul avatar Sep 15 '20 16:09 kir0ul

As a workaround, it works if I add the data containers to a new file. Indeed the latest print statements in the following minimal working example match.

from datetime import datetime
from dateutil.tz import tzlocal
from pynwb import NWBFile
from pynwb import TimeSeries
from pynwb import NWBHDF5IO
import numpy as np
from pynwb import get_manager

# Create the base data
manager = get_manager()
start_time = datetime(2017, 4, 3, 11, tzinfo=tzlocal())
create_date = datetime(2017, 4, 15, 12, tzinfo=tzlocal())
data = np.arange(1000).reshape((100, 10))
timestamps = np.arange(100.)
filename1 = "file1.nwb"
filename2 = "file2.nwb"
filename3 = "file3.nwb"

# Create the 1st file
nwbfile1 = NWBFile(
    session_description="demonstrate external files",
    identifier="NWBE1",
    session_start_time=start_time,
    file_create_date=create_date,
)
test_ts1 = TimeSeries(
    name="test_timeseries1", data=data, unit="SIunit", timestamps=timestamps
)
nwbfile1.add_acquisition(test_ts1)
with NWBHDF5IO(filename1, "w") as io:
    io.write(nwbfile1)

# Check what's inside the file
with NWBHDF5IO(filename1, "r") as io:
    f1 = io.read()
    print("1st file:")
    print(f1)

# Create the 2nd file
nwbfile2 = NWBFile(
    session_description="demonstrate external files",
    identifier="NWBE2",
    session_start_time=start_time,
    file_create_date=create_date,
)
test_ts2 = TimeSeries(
    name="test_timeseries2", data=data, unit="SIunit", timestamps=timestamps
)
nwbfile2.add_acquisition(test_ts2)
with NWBHDF5IO(filename2, "w") as io:
    io.write(nwbfile2)

# Check what's inside the file
with NWBHDF5IO(filename2, "r") as io:
    f2 = io.read()
    print("2nd file:")
    print(f2)

# Get the 1st container
with NWBHDF5IO(filename1, "r", manager=manager) as io1:
    nwbfile1 = io1.read()
    timeseries_1 = nwbfile1.get_acquisition("test_timeseries1")
    with NWBHDF5IO(filename2, "r", manager=manager) as io2:
        nwbfile2 = io2.read()
        timeseries_2 = nwbfile2.get_acquisition("test_timeseries2")

        # Combine the containers in the 3rd file
        with NWBHDF5IO(filename3, "w", manager=manager) as io3:
            nwbfile3 = NWBFile(
                session_description="demonstrate external files",
                identifier="NWBE3",
                session_start_time=start_time,
                file_create_date=create_date,
            )
            nwbfile3.add_acquisition(timeseries_1)
            nwbfile3.add_acquisition(timeseries_2)
            io3.write(nwbfile3, link_data=False)
            print("What I should see in the 3rd file:")
            print(nwbfile3)

# Check what's inside the 3rd file
with NWBHDF5IO(filename3, "r") as io:
    f3 = io.read()
    print("What really is in the 3rd file:")
    print(f3)

kir0ul avatar Sep 15 '20 16:09 kir0ul

@kiroul We are doing some last minute preparations for the NWB workshop, so unfortunately, I do not have time right not to look at this issue in detail. I plan to get to this sometime next week. I'm glad you were able to find a workaround for the meantime.

rly avatar Sep 17 '20 00:09 rly

Sure, thanks for letting me know :slightly_smiling_face:

kir0ul avatar Sep 17 '20 02:09 kir0ul

Any news on this issue? In the end the workaround didn't work on the real code, the data containers don't get written in the output NWB file :slightly_frowning_face:

kir0ul avatar Sep 30 '20 19:09 kir0ul

@kir0ul Copying containers/groups is not yet supported by PyNWB, but copying datasets is. That is what the link_data=False flag allows. Here is another workaround based on that:

    # Add it to the 2nd file
    with NWBHDF5IO(filename2, "r+", manager=manager) as io2:
        nwbfile2 = io2.read()
        timeseries_1_copy = TimeSeries(
            name=timeseries_1.name,
            data=timeseries_1.data,
            unit=timeseries_1.unit,
            timestamps=timeseries_1.timestamps
        )
        nwbfile2.add_acquisition(timeseries_1_copy)

rly avatar Mar 30 '21 22:03 rly

It looks like this is not an issue anymore with latest versions.

kir0ul avatar Oct 05 '23 22:10 kir0ul

Sorry, I got confused with the title that says containers, but the code examples are about copying datasets, which can actually be copied from one file to the other. Is it still the case that copying containers is not supported?

kir0ul avatar Oct 06 '23 00:10 kir0ul