verse
verse copied to clipboard
How to read the output binary file in Python?
Hi Authors,
Can you please let me know how to read the output binary file as a matrix of |vocab| x |dim| size or in some other consumable fashion? How do I get the vocabulary?
Pankesh
Dear Pankesh,
The vocabulary is assumed to be [0..n-1] integers, the user is supposed to convert the graph to the matrix format themselves.
As for the output binary file, it is just a binary matrix of floats, you can read it it python with
np.fromfile('embedding.bin', np.float32).reshape(num_nodes, embedding_dim)
Hope that helps. Anton
Is it required for the vocab to be a consecutive [0..n-1] integers? Could the vocab contain [0..n-1] with integers missing in between or start from a diff range [m..n]?
C++ program takes a binary CSR file as input, and produces embeddings for every row of this matrix, simply speaking. So yes, vocab (as in bcsr file) must be consecutive [0..n) integers for the program to operate as expected. However, I provide the utility that converts files in different formats, including non-standard vocabulary graphs, to bcsr.
Got it, thank you for clarifying