vecalign icon indicating copy to clipboard operation
vecalign copied to clipboard

What if I have a very large corpus?

Open SefaZeng opened this issue 1 year ago • 1 comments

The embed output is 1.4T and it's too large to load this array to memory. Any tips for this?

SefaZeng avatar Jan 08 '24 03:01 SefaZeng

You could try PCA - see Figure 2 in appendix A of this paper for accuracy vs dimension analysis

thompsonb avatar Jan 08 '24 09:01 thompsonb