sparse_dot_topn
sparse_dot_topn copied to clipboard
are you planning to develop code for binary matrix ( values only 0 or 1)
what is about binary matrix ? are you planning to develop code for binary matrix ( values only 0 or 1)
It should work already if you cast them before hands to np.float32 (for 32 bits).
We could implement for bool types (1 bit), and maybe get a smaller memory footprint and performance boost. Let us known your use case.
great news thank you so much one hot data is the sparse data with ( values only 0 or 1) asked example is https://scikit-learn.org/stable/auto_examples/text/plot_document_classification_20newsgroups.html only you do need to use count vectorizer output tp feed to one hot
full example is here https://towardsdatascience.com/natural-language-processing-count-vectorization-with-scikit-learn-e7804269bb5e main lines
Or if we wanted to get the vector for one word:
print('Hot vector: ') print(vectorizer.transform(['hot']).toarray()) or simple example here to get one hot https://www.ritchieng.com/machinelearning-one-hot-encoding/
Thanks again may you do at asap pls
any updated pls
It works for binary matrices if you cast them, see my first message. If it doesn't, explain why and the give full details and size of your problem.
Cast to what to binary type Only zero and ones One bit values? My guess for code as it is for now It can not be done... As you wrote about We could implement for bool types (1 bit), and maybe get a smaller memory footprint and performance boost.
@Sandy4321 bool types are not 1 bit but one byte as that is the smallest addressable unit for CPUs. You can of course pack bits into other types and there is vector<bool>
which may pack bools but that is implementation dependent and could save space but not sure it will give a speedup.
Great Then let's do at least byte size data? And of cause sparse format data Huge ram saving!
some ideas you can try 8 bits number https://arxiv.org/abs/2208.07339 https://huggingface.co/blog/hf-bitsandbytes-integration
Closing due to inactivity