tslearn icon indicating copy to clipboard operation
tslearn copied to clipboard

SBD distance function

Open NimaSarajpoor opened this issue 5 years ago • 4 comments

Hi,

I was wondering if the SBD distance function (used in KShape) can be easily accessed?

For instance, in spite of the existence of cdist_dtw or k-mean clustering (with dtw metric) in tslearn package, the DTW distance itself function is available.

I was wondering if the same thing can be applied for the KShape method and its distance (SBD) can be easily used? I understand that DTW is more popular, but having access to new tools such as SBD function can help researchers on the application side to better investigate new methods for their own problems.

Best, Nima

NimaSarajpoor avatar Jul 16 '20 02:07 NimaSarajpoor

It seems that you can add these two functions in KShape source code: `

def _get_norms(self,X):
    norms = []
    for x in X:
        result = numpy.linalg.norm(x)
        norms.append(result)
    norms = numpy.array(norms)
    return norms

def _sbd_cross_dist(self,X):
    norms = self._get_norms(X)
    return 1. - cdist_normalized_cc(X, X, norms1=norms, norms2=norms, self_similarity=False)`

so you can get SBD cross distances through this: `

#KShape class
ks = KShape(n_clusters= num_cluster ,n_init= 5 ,verbose= True ,random_state=seed)
X = TimeSeriesScalerMeanVariance(mu= 0.0 ,std= 1.0 ).fit_transform(X)
y_pred = ks.fit_predict(X)
dists = ks._sbd_cross_dist(X)
np.fill_diagonal(dists,0)
score = silhouette_score(dists,y_pred,metric="precomputed")
print("silhouette_score: " + str(score))`

That's what i did, and it works. For more details you can see https://blog.csdn.net/qq_37960007/article/details/107937212

xnwder avatar Aug 13 '20 09:08 xnwder

I definitely support this feature request!

In the meantime, you can use the SBD function from this repo:

def _sbd(x, y):
    """
    >>> _sbd([1,1,1], [1,1,1])
    (-2.2204460492503131e-16, array([1, 1, 1]))
    >>> _sbd([0,1,2], [1,2,3])
    (0.043817112532485103, array([1, 2, 3]))
    >>> _sbd([1,2,3], [0,1,2])
    (0.043817112532485103, array([0, 1, 2]))
    """
    ncc = _ncc_c(x, y)
    idx = ncc.argmax()
    dist = 1 - ncc[idx]
    yshift = roll_zeropad(y, (idx + 1) - max(len(x), len(y)))

    return dist, yshift

using the following helper functions:

def roll_zeropad(a, shift, axis=None):
    a = np.asanyarray(a)
    if shift == 0:
        return a
    if axis is None:
        n = a.size
        reshape = True
    else:
        n = a.shape[axis]
        reshape = False
    if np.abs(shift) > n:
        res = np.zeros_like(a)
    elif shift < 0:
        shift += n
        zeros = np.zeros_like(a.take(np.arange(n-shift), axis))
        res = np.concatenate((a.take(np.arange(n-shift, n), axis), zeros), axis)
    else:
        zeros = np.zeros_like(a.take(np.arange(n-shift, n), axis))
        res = np.concatenate((zeros, a.take(np.arange(n-shift), axis)), axis)
    if reshape:
        return res.reshape(a.shape)
    else:
        return res


def _ncc_c(x, y):
    """
    >>> _ncc_c([1,2,3,4], [1,2,3,4])
    array([ 0.13333333,  0.36666667,  0.66666667,  1.        ,  0.66666667,
            0.36666667,  0.13333333])
    >>> _ncc_c([1,1,1], [1,1,1])
    array([ 0.33333333,  0.66666667,  1.        ,  0.66666667,  0.33333333])
    >>> _ncc_c([1,2,3], [-1,-1,-1])
    array([-0.15430335, -0.46291005, -0.9258201 , -0.77151675, -0.46291005])
    """
    den = np.array(norm(x) * norm(y))
    den[den == 0] = np.Inf

    x_len = len(x)
    fft_size = 1 << (2*x_len-1).bit_length()
    cc = ifft(fft(x, fft_size) * np.conj(fft(y, fft_size)))
    cc = np.concatenate((cc[-(x_len-1):], cc[:x_len]))
    return np.real(cc) / den

swifmaneum avatar Feb 15 '22 12:02 swifmaneum

Hi @swifmaneum ,

This seems like a great addition to tslearn. Do you reckon this could be built into a PR?

Best regards, Gilles

GillesVandewiele avatar Feb 15 '22 14:02 GillesVandewiele

Hi @GillesVandewiele, sure, I added a WIP merge request. Tests and documentation are still lacking...

swifmaneum avatar Feb 20 '22 11:02 swifmaneum