verse icon indicating copy to clipboard operation
verse copied to clipboard

How to read the output binary file in Python?

Open pbamotra opened this issue 7 years ago • 4 comments

Hi Authors,

Can you please let me know how to read the output binary file as a matrix of |vocab| x |dim| size or in some other consumable fashion? How do I get the vocabulary?

Pankesh

pbamotra avatar Mar 30 '18 00:03 pbamotra

Dear Pankesh,

The vocabulary is assumed to be [0..n-1] integers, the user is supposed to convert the graph to the matrix format themselves.

As for the output binary file, it is just a binary matrix of floats, you can read it it python with

np.fromfile('embedding.bin', np.float32).reshape(num_nodes, embedding_dim)

Hope that helps. Anton

xgfs avatar Mar 30 '18 04:03 xgfs

Is it required for the vocab to be a consecutive [0..n-1] integers? Could the vocab contain [0..n-1] with integers missing in between or start from a diff range [m..n]?

adityasundaram avatar Feb 08 '19 01:02 adityasundaram

C++ program takes a binary CSR file as input, and produces embeddings for every row of this matrix, simply speaking. So yes, vocab (as in bcsr file) must be consecutive [0..n) integers for the program to operate as expected. However, I provide the utility that converts files in different formats, including non-standard vocabulary graphs, to bcsr.

xgfs avatar Feb 08 '19 08:02 xgfs

Got it, thank you for clarifying

adityasundaram avatar Feb 08 '19 09:02 adityasundaram