molenc
molenc copied to clipboard
dense encoding for UHD
maybe a counted Bloom filter with k and N parameters
Done, but needs to test to see if useful in practice. E.g.
molenc_dense --bloom 5,50 -n 3214 -i data/x_std_01.txt > test_bloom.csv
investigate Bloom filter or MinHash or LSH for current milenial FPs
does not outperform ECFP4 2048b in a regression benchmark