ADIOS2 icon indicating copy to clipboard operation
ADIOS2 copied to clipboard

Unclear handling of attributes in BP4/BP5

Open jorgensd opened this issue 1 year ago • 4 comments

The following minimal example using the ADIOS2 Python interface (2.10.2), shows a difference in how variables are handled in the BP4 and BP5 formats.

import adios2.bindings as adios2
from mpi4py import MPI


def read_attr(engine):
    filename = "test_" + engine + ".bp"

    adios = adios2.ADIOS(MPI.COMM_WORLD)


    io = adios.DeclareIO("reader" + engine)
    io.SetEngine(engine)
    file = io.Open(str(filename), adios2.Mode.Read)
    print(engine,  io.AvailableAttributes().keys())
    for step in range(file.Steps()):
        file.BeginStep()
        print(engine, step, io.AvailableAttributes().keys())
        file.EndStep()
    file.Close()
    adios.RemoveIO("reader"+engine)


def write_attr(engine):
    filename = "test_" + engine + ".bp"

    # Write two attributes to file
    adios = adios2.ADIOS(MPI.COMM_WORLD)
    io = adios.DeclareIO("writer" + engine)
    io.SetEngine(engine)
    adios_file = io.Open(str(filename), adios2.Mode.Write, MPI.COMM_WORLD)
    adios_file.BeginStep()
    io.DefineAttribute("a", "first")
    adios_file.PerformPuts()
    adios_file.EndStep()

    adios_file.BeginStep()
    io.DefineAttribute("b", "last")
    adios_file.PerformPuts()
    adios_file.EndStep()

    adios_file.Close()
    adios.RemoveIO("writer"+"engine")

if __name__ == "__main__":
    write_attr("BP4")
    read_attr("BP4")
    write_attr("BP5")
    read_attr("BP5")


This yields:

BP4 dict_keys(['a'])
BP4 0 dict_keys(['a'])
BP5 dict_keys([])
BP5 0 dict_keys(['a'])
BP5 1 dict_keys(['a', 'b'])

Which makes the handling of both formats within Python very hard to maintain. Is this a change that was made on purpose or a bug?

jorgensd avatar Feb 27 '25 16:02 jorgensd

Some things about this example surprise me and some do not. There is a fundamental difference between BP4 and BP5 surrounding the more explicit separation between "streaming" and "random access" read modes. In BP4 these were somewhat blurred. BP4 loads all file metadata immediately upon Open() regardless of access mode. However in the default Adios.Mode.Read, BP5 loads each timesteps metadata only upon BeginStep. Therefore there are no attributes available before BeginStep, and the attributes are added cumulatively as you read additional steps. So, the BP5 output above looks reasonable for those semantics. (You should get different semantics if you specify Mode.ReadRandomAccess.) I'm a little fuzzier why you're not seeing "b" in the BP4 output. @pnorbert ?

eisenhauer avatar Feb 27 '25 17:02 eisenhauer

However in the default Adios.Mode.Read, BP5 loads each timesteps metadata only upon BeginStep.

This makes a big difference for my applications, as I've assumed that Attributes were time-independent. I based this on the documentation, that states:

Attribute: Attributes add extra information to the overall variables dataset defined in the IO class. They can be single or array values.

Having to define these for each write step is not something I can do for my applications, as write steps are not assoicated with time steps, and the data evolves over time (One first might write a mesh, then some function data, then some markers, then function data for a different time step). With BP5, I now have to loop over all steps to find the right step in the ADIOS2-file.

Suddenly the divide between Attribute and Variable is not clear to me? A Global single-value Variable and an attribute now seems like the same kind of object to me.

jorgensd avatar Mar 03 '25 12:03 jorgensd

Mode.Read is setup to match semantics that ADIOS can provide in a streaming situation. That is, one in which the writer and reader are running simultaneously and data flows directly from one to another over the network. In this circumstance time-independence is impossible. Possibly you want Mode.ReadRandomAccess? You can't use BeginStep/EndStep with that in BP5, but if you structure your code so that instead you use SetStepSelection for the variables that you read then your code will work with both BP4 and BP5 and all attributes will be available immediately upon Open(). The downside of that approach is higher Open() cost and memory utilization because all file metadata is read immediately. (Higher as compared to BP5 Mode.Read. BP4 always has those higher costs and memory utilization).

(The BP4 engine and prior versions of ADIOS in general didn't have a strict differentiation between Read and ReadRandomAccess and provided inconsistent semantics in streaming vs. non-streaming situations. BP5 tries to enforce a stronger line between access methods with clearer semantics.)

The most obvious difference between Attribute and Variable is that Attributes are persistent on the reader side, where variables are not. That is, once set and Attribute is always available to query regardless of timestep. They used to be immutable as well, but ADIOS has now introduced mutable attributes to accommodate user requests.

eisenhauer avatar Mar 03 '25 13:03 eisenhauer

As a note on the BP4 test, I don't understand why but it does not work as expected. Even though this high-level python API example (which uses the bindings as the example above) works as expected.

import adios2
from mpi4py import MPI

# MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

def read_attr(engine):
    filename = "testHL_" + engine + ".bp"
    with adios2.FileReader(filename, comm) as fh:
        attrs = fh.available_attributes()
        print(engine,  attrs.keys())
        for aname in attrs.keys():
            a = fh.inquire_attribute(aname)
            print(f"{aname} = {a.data()}")

    with adios2.Stream(filename, "r", comm) as fh:
        for _ in fh.steps():
            print(f"----- step {fh.current_step()}")
            attrs = fh.available_attributes()
            print(engine,  attrs.keys())
            for aname in attrs.keys():
                a = fh.inquire_attribute(aname)
                print(f"{aname} = {a.data()}")



def write_attr(engine):
    filename = "testHL_" + engine + ".bp"
    with adios2.Stream(filename, "w", comm) as fh:
        for _ in fh.steps(4):
            currentStep = fh.current_step()
            fh.write_attribute("attr"+str(currentStep), currentStep)

if __name__ == "__main__":
    write_attr("BP4")
    read_attr("BP4")
    write_attr("BP5")
    read_attr("BP5")
$ python3 ./testHL.py
BP4 dict_keys(['attr0', 'attr1', 'attr2', 'attr3'])
attr0 = 0
attr1 = 1
attr2 = 2
attr3 = 3
----- step 0
BP4 dict_keys(['attr0'])
attr0 = 0
----- step 1
BP4 dict_keys(['attr0', 'attr1'])
attr0 = 0
attr1 = 1
----- step 2
BP4 dict_keys(['attr0', 'attr1', 'attr2'])
attr0 = 0
attr1 = 1
attr2 = 2
----- step 3
BP4 dict_keys(['attr0', 'attr1', 'attr2', 'attr3'])
attr0 = 0
attr1 = 1
attr2 = 2
attr3 = 3
BP5 dict_keys(['attr0', 'attr1', 'attr2', 'attr3'])
attr0 = 0
attr1 = 1
attr2 = 2
attr3 = 3
----- step 0
BP5 dict_keys(['attr0'])
attr0 = 0
----- step 1
BP5 dict_keys(['attr0', 'attr1'])
attr0 = 0
attr1 = 1
----- step 2
BP5 dict_keys(['attr0', 'attr1', 'attr2'])
attr0 = 0
attr1 = 1
attr2 = 2
----- step 3
BP5 dict_keys(['attr0', 'attr1', 'attr2', 'attr3'])
attr0 = 0
attr1 = 1
attr2 = 2
attr3 = 3

pnorbert avatar Mar 04 '25 18:03 pnorbert