librascal icon indicating copy to clipboard operation
librascal copied to clipboard

Computation of kernel between atomic environments takes a long time using rascal.models.kernel

Open bananenpampe opened this issue 3 years ago • 7 comments

Hello everyone,

I am computing the polynomial kernel between atomic environments using the rascal.models.kernel class

for structure in structures_train: 
    mask_center_atoms_by_species(structure,species_select=[1])

calculator = SphericalInvariants(**hypers)
atoms_list_train = calculator.transform(structures_train)
kernel = Kernel(calculator,target_type="Atom",zeta=2)
kernel_computed = kernel(atoms_list_train)

According to cProfile this takes about 42 seconds. However, when I compute the same kernel using the sklearn.metrics.pairwise.polynomial_kernel kernel implementation:

X_train = calculator.transform(structures_train).get_features(calculator)
Kernel_sklearn = sklearn.metrics.pairwise.polynomial_kernel(X_train, degree=2, gamma=1., coef0=0)
np.allclose(kernel_computed,Kernel_sklearn) #returns True

According to cProfile the calculation of the kernel only takes 2 seconds. Is there something I am missing ?

The transformation of the features takes about 1 second

bananenpampe avatar Oct 27 '21 13:10 bananenpampe

That's unexpected! Do you have a full script we could use to reproduce this?

Luthaf avatar Oct 27 '21 14:10 Luthaf

Dear bananenpampe,

Thanks for reporting! There are many factors that influence the speed of kernel computation (in librascal and elsewhere), so it would help to have more information on what environment you're running this in and what sort of structures you're testing this on, in particular the atomic species composition. The librascal Kernel calculator is designed to run quickly especially in cases where you have many different atomic species present in a system, but only a few will be present in any given environment, meaning the feature matrix -- X_train in your example -- is effectively (block-)sparse. But it should not be any slower than direct matrix multiplication in any case.

A few things to check:

  • Have you compiled in Release mode? (i.e. not Debug)
  • Is it possible that the sklearn calculation is running in OpenMP parallel mode? (use export OMP_NUM_THREADS=1 to be sure.) We have not yet implemented OpenMP parallelism in librascal, though some of the underlying Eigen operations should still be vectorized IIRC
  • I notice you're using center atom masking; we haven't extensively tested kernels with that. What is the performance without using the masking?

Thanks!

max-veit avatar Oct 27 '21 14:10 max-veit

Hello, Thank you for your fast responses. I have run cmake without any option, that should be the Release option, right? Changing the OMP environment variable and not using masking has not changed the behaviour.

I have included a script that should reproduce the behaviour and the data file that I am using which should be placed in the same directory. reproduce.py.zip . CSD-2k_relaxed_shifts.txt.zip

bananenpampe avatar Oct 27 '21 14:10 bananenpampe

Hello, could you try to set "expansion_by_species_method":"structure wise" in your input for SphericalInvariants ? In principle, it should give similar timings as using directly numpy. If it does not then could you try with "expansion_by_species_method":'user defined', "global_species":[THE ATOMIC NUMBERS PRESENT IN YOUR DATASET]. This should be strictly the same as using numpy or sklearn but who knows...

felixmusil avatar Oct 27 '21 16:10 felixmusil

Including "expansion_by_species_method":"structure wise" improves the performance (5.5 seconds). Including "expansion_by_species_method":'user defined', "global_species":[THE ATOMIC NUMBERS PRESENT IN YOUR DATASET] helps as well (6.5 seconds).

For a larger number of environments, sklearn is three times faster using the options you mentioned.

bananenpampe avatar Oct 27 '21 20:10 bananenpampe

Looks ok to me, thank for your tests. @Luthaf, @ceriottm and @agoscinski what do you think ? We could also use a numpy implementation rather than the rascal one in this case ? Or at least give the possibility, because the sparse storage is quite important for some applications?

felixmusil avatar Oct 29 '21 07:10 felixmusil

We just would need to replace the one line in kernel computation for the case kernel_type Full to a numpy multiplication, right? This would still allow people to use the sparse storage in cases needed, while using the faster numpy routines in the full case.

https://github.com/cosmo-epfl/librascal/blob/master/bindings/rascal/models/kernels.py#L143

agoscinski avatar Nov 01 '21 21:11 agoscinski