conduit icon indicating copy to clipboard operation
conduit copied to clipboard

support sub-tree fetch for conduit_bin protocol

Open cyrush opened this issue 8 years ago • 7 comments

  • add support to parse schema, and then only load the proper subset of the tree from the binary file

cyrush avatar May 02 '17 16:05 cyrush

To implement this generally, we will need something like the following:

load(ifstream &ifs, const Schema &s, const std::string &sub_path, Node &out)

In this method given s, we can fetch the sub-path schema and call a more general function:

load(ifstream &ifs, const Schema &s, Node &out)

We will assume the schema offsets are interpreted as byte offset into the bin file.

Ideally, we want a single I/O request.

To do this, we need the minimum offset in the passed schema.

We don't have a method that gives us this currently.

spanned_bytes() gives us the total spanned bytes, which includes the offset.

I believe what we need is to seek the bin file to the minimum offset, and then read spanned_bytes() - minimum offset bytes.

Finally, when constructing the node from this chunk of data, we will need to shift the schema used to subtract the minimum offset.

To do this, we need:

  1. a helper in Schema to get the minimum offset: Schema::minimum_offset().
  2. a helper in Schema to shift the offset of an entire schema tree Schema::shift_offset(index_t )

cyrush avatar Aug 08 '17 16:08 cyrush

@bryujin What do you think about adding the above methods to Schema?

If we don't feel comfortable doing this, I can explore implementing them directly in the new load and see how things work.

cyrush avatar Aug 08 '17 16:08 cyrush

@cyrush Schema::shift_offset(index_t ) would actually apply the offset to the entire Schema and change it? Is the plan to shift it, read the file and then shift it back? Is that really necessary? I'm probably OK with the Schema::minimum_offset() method.

bryujin avatar Aug 09 '17 20:08 bryujin

@bryujin The plan would be to read the proper chunk of data, then make a shift-ed copy of the schema so it can be used to describe the chunk.

(the output node would get the chunk of data with the shifted schema)

cyrush avatar Aug 09 '17 23:08 cyrush

I think its probably best to prototype this w/o changing the public interface of Schema, then see what we are open to supporting.

cyrush avatar Aug 09 '17 23:08 cyrush

OK, that would help.

bryujin avatar Aug 11 '17 21:08 bryujin

An idea like #286 could help

cyrush avatar May 16 '18 00:05 cyrush