djl
djl copied to clipboard
Utility class for constructing NDArrays from java
Description
When using the framework I found that constructing an empty NDArray then using the NDArray.set function to set values one by one was a bottleneck in the code. DJL could use a utility class that efficiently builds an NDArray.
An example of the API would be. NDArrayBuilder data=new NDArrayBuilder(new Shape(3,4,5));
for(int i=0;i<3;i++){ for(int j=0;j<4;j++){ for(int k=0;k<5;k++){ data.set(new int[]{i,j,k},i * j * k); } } } NDArray arr=data.build(manager); //manager is the NDManager
Will this change the current api? How? This will add a helper class but will not change the existing API otherwise.
Who will benefit from this enhancement? DJL already provides an efficient way to construct 1D and 2D NDArrays. Any code that is generating higher dimentional NDArrays from java code could benefit from this change.
@ElchananHaas You can create any shape of NDArray without need to set them one by one, here are a few examples:
float[] data = new float[] {0, 1, 2, 3, 4, 5, 6. 7};
// create a 3d NDArray from float[]
NDArray array = manager.create(data, new Shape(2, 2, 2));
// create a 3d NDArray from a Buffer
NDArray array = manager.create(FloatBuffer.wrap(data), new Shape(2, 2, 2));
// create an empty NDArray and then set the data:
NDArray array = manager.create(new Shape(2, 2, 2));
array.set(FloatBuffer.wrap(data));
// create an empty NDArray and set the data with direct buffer:
NDArray array = manager.create(new Shape(2, 2, 2));
ByteBuffer buf = manager.manager.allocateDirect(8 * 4); // using direct buffer to avoid memory copy
buf.asFloatBuffer().put(data);
array.set(buf);
@frankfliu The class I am describing would use the operations in your comment internally. The benefit of the class I am describing is that it would handle the array indexing calculations internally. It may be too niche of a use case to justify adding to the framework.
@ElchananHaas I used to create something like you mentioned, do you expect user to just do
NDManager.create(Object array);
The array can be anything like float[float[float[...]]], or other data type.
@lanking520 Your suggestion of providing a generic create method looks best. I don't know if the Java type system could enforce that it is an arbitrarily nested array. I think that a method taking in an arbitrary object could be confusing to use but it may be the only way.
@ElchananHaas
I agree with you that NDManager.create(Object array)
is confusing. And implementation cannot cover all cases. User has to guess what type might work.
I thought about your idea, here is something that I can come up:
public static class NDArrayBuilder {
private NDManager manager;
private ByteBuffer buffer;
private long[] shape;
public NDArrayBuilder(NDManager manager, Shape shape, DataType dataType) {
this.manager = manager;
dataType.asDataType(manager.allocateDirect(shape.size()));
this.shape = shape.getShape();
}
public NDArrayBuilder set(int value, int... index) {
}
public NDArrayBuilder set(long value, int... index) {
}
public NDArrayBuilder set(double value, int... index) {
}
public NDArray build(NDManager manager) {
}
}
I'm not happy with set()
functions, but I don't know if we can do better.
@frankfliu Should I work on a pull request for this feature? If so what file should I add it to?
@ElchananHaas You can add a new class NDArrayBuilder.java here: https://github.com/deepjavalibrary/djl/tree/master/api/src/main/java/ai/djl/ndarray
We already have NDManager.create()
function that covers 1d and 2d array cases. Initializing a complicate 3d+ array isn't that common. And implementing a builder that support higher dimensions arrays isn't feasible:
- Implementing complicated set function in java defeat the purpose of using native NDArray. it is part of NDArray feature
- Using intermediate NDArray cannot guarantee performance and may be abused.