deeplake icon indicating copy to clipboard operation
deeplake copied to clipboard

htype for tensors containing only numbers

Open LucasVandroux opened this issue 3 years ago • 3 comments

I am wondering what htype to use for a tensor containing only numbers.

In the tutorial Creating Time-Series Datasets, the tensors containing only numbers are not given any htype which from what I understand would default to htype=generic for those tensors with dtype=None. (Source)

Moreover, from the HTYPE_CONFIGURATIONS (Source) it seems there is no specific htype for numbers (int or float).

Should I therefore define a custom htype and dtype for the tensors that contains only numbers by myself, or use htype=text?

LucasVandroux avatar Mar 16 '22 22:03 LucasVandroux

Thanks @LucasVandroux for raising the question. Leaving htype=generic as default and optionally specifying dtype=np.uint8 or dtype=int in create_tensor would be suffice for optimally storing the data. If you don't specify dtype, then the first element inserted into the tensor would be assumed dtype for it.

Would you like to have stricter mode where only ints can be appended to the tensor?

davidbuniat avatar Mar 16 '22 22:03 davidbuniat

@davidbuniat thank you for your prompt answer.

Yes, I would find it useful and more intuitive if there would be a htype mode for int and float, which would also prevent appending data of a different type to the tensor later on. It might definitely prevent some type's problems down the line.

LucasVandroux avatar Mar 16 '22 23:03 LucasVandroux

very interesting point @LucasVandroux , tagging @istranic for visibility re: this!

mikayelh avatar Mar 16 '22 23:03 mikayelh