shogun icon indicating copy to clipboard operation
shogun copied to clipboard

SGVector cleanup.

Open iglesias opened this issue 11 years ago • 23 comments

Remove all the methods producing clutter in SGVector. This includes (although it is not limited to) math operations like linspace, dot, and other operations like find. The objective is to have an SGVector as lean as possible.

Systematically, the work to be done for each method reduced is:

  1. Move the method from SGVector to another class, like for instance to CMath or to whichever it makes sense the most.
  2. Refactor Shogun's internal code.
  3. Refactor examples (libshogun and interface examples, if any).
  4. In case the method already had unit tests in SGVector, then refactor them (and move them). Otherwise, write some good unit tests.

See this pull request to gain understanding of what this task consists of, #2579.

iglesias avatar Oct 27 '14 06:10 iglesias

I forgot to mention that it is important to do this separately, both to ease the review and the cleanup process. That means that the best practice would be to send small and individual pull requests for each method.

iglesias avatar Oct 28 '14 09:10 iglesias

See the wiki for more motivation why we want to do this.

iglesias avatar Oct 28 '14 09:10 iglesias

@iglesias, I would like to work on this. Shall I start by moving dot to CMath? Should I do more changes in the first pull request?

sanuj avatar Dec 08 '14 17:12 sanuj

According to what I wrote above, moving dot to CMath sounds good :-)

I think that doing that one alone in a first pull request is good.

iglesias avatar Dec 08 '14 18:12 iglesias

@iglesias I'll leave the functions related to linear algebra (we might want to move them to linalg). So I'll leave functions like add, sum, norm, add_scalar, product, vector_multiply etc for now while we are working on linalg NATIVE implementations. Should I meanwhile move functions like max, min, mean, arg_max, arg_min, max_abs to CMath? Don't know where to put linspace and sorting functions.

sanuj avatar Dec 13 '14 06:12 sanuj

Sure. In my opinion, max, min, argmax, argmin, and linspace make sense in CMath. Mean might make more sense in CStatistics. About sorting I am not sure, I leave it up to you :-) But there might be already some sorting implemented in CMath (iirc).

iglesias avatar Dec 15 '14 09:12 iglesias

I am a bit confused. As far as I know, the declaration and definition of a template should be kept in one single file, but if we take the case of SGVector, then it has a header file and a cpp file. How is this working out?

sanuj avatar Dec 15 '14 17:12 sanuj

We do a small trick. Look at the end of SGVector.cpp. There, we have defined the classes for which SGVector can be templated (which are basically primitive data types).

iglesias avatar Dec 15 '14 18:12 iglesias

There is SGVector::linspace_vec() which returns a vector, SGVector::linspace() which returns a float64_t * and depends on CMath::linspace which returns void (passes the output through an input parameter pointer). None of them are used anywhere apart from SGVector and CMath. What to do with these?

sanuj avatar Dec 17 '14 17:12 sanuj

If they are not used anywhere, I'd say we can drop them. We can for sure drop the ones in SGVector. The CMath one can stay.

iglesias avatar Dec 17 '14 18:12 iglesias

There are three functions related to sorting in SGVector: qsort(), argsort() and is_sorted() qsort() looks like this

template <class T>
void SGVector<T>::qsort()
{
    CMath::qsort<T>(vector, vlen);
}

So if we move qsort to CMath then we will have to pass the vector as an argument or we can remove SGVector<T>::qsort entirely and use the one mentioned in the above patch but in this case we'll need to pass the length as well.

Should I also move argsort? (moving is_sorted() doesn't make sense to me)

sanuj avatar Dec 22 '14 15:12 sanuj

In my opinion, void CMath::qsort(SGVector<T>), SGVector<index_t> CMath::argsort(SGVector), and bool CMath::is_sorted(SGVector<T>) make sense.

iglesias avatar Dec 22 '14 23:12 iglesias

@iglesias would it be fine , if I moved range_fill to CMath ?

curiousguy13 avatar Feb 14 '15 11:02 curiousguy13

@curiousguy13, it sounds good to me, both range_fill and range_fill_vector should go out of SGVector. Do something nice with them (perhaps, simplifying to only one method) and move to CMath. Keep in mind unit tests. Looking forward to see the pull request.

iglesias avatar Feb 14 '15 17:02 iglesias

@iglesias I think range_fill, zeros, these are perfect to be shifted to linalg instead.

lambday avatar Feb 18 '15 09:02 lambday

It sounds good! On 18 Feb 2015 10:26, "Soumyajit De" [email protected] wrote:

@iglesias https://github.com/iglesias I think range_fill, zeros, these are perfect to be shifted to linalg instead.

— Reply to this email directly or view it on GitHub https://github.com/shogun-toolbox/shogun/issues/2582#issuecomment-74835157 .

iglesias avatar Feb 19 '15 18:02 iglesias

I would like to work on this. Shall I start by moving add to CMath?

Hephaestus12 avatar Mar 21 '20 22:03 Hephaestus12

I would like to work on this. Shall I start by moving add to CMath?

You shouldn’t move to CMath but to linalg instead. However linalg already has add. But it would be good to remove add from SGVector. AFAIK CMath (now Math) should be dropped.

gf712 avatar Mar 22 '20 07:03 gf712

This is for the method add() in SGVector. I don't think simply removing it and replacing its calls with the add method in the Linalg framework will help. For the following reasons:

In file: src/shogun/mathematics/linalg/backend/eigen/BasicOps.cpp :

#define BACKEND_GENERIC_IN_PLACE_ADD(Type, Container)                          \
	void LinalgBackendEigen::add(                                              \
	    const Container<Type>& a, const Container<Type>& b, Type alpha,        \
	    Type beta, Container<Type>& result) const                              \
	{                                                                          \
		add_impl(a, b, alpha, beta, result);                                   \
	}

add() calls add_impl()

In file: src/shogun/mathematics/linalg/backend/eigen/BasicOps.cpp :

template <typename T>
void LinalgBackendEigen::add_impl(
    const SGVector<T>& a, const SGVector<T>& b, T alpha, T beta,
    SGVector<T>& result) const
{
	typename SGVector<T>::EigenVectorXtMap a_eig = a;
	typename SGVector<T>::EigenVectorXtMap b_eig = b;
	typename SGVector<T>::EigenVectorXtMap result_eig = result;

	result_eig = alpha * a_eig + beta * b_eig;
}

add_impl() uses the operator+

In file: src/shogun/lib/SGVector.cpp :

/** addition operator */
template<class T>
SGVector<T> SGVector<T>::operator+ (SGVector<T> x)
{
	assert_on_cpu();
	require(x.vector && vector, "Addition possible for only non-null vectors.");
	require(x.vlen == vlen, "Length of the two vectors to be added should be same. [V({}) + V({})]", vlen, x.vlen);

	SGVector<T> result=clone();
	result.add(x);
	return result;
}

The operator+ uses SGVector's add() method.

So indirectly, even linalg's add() method uses SGVector's add() method. Will we need to entirely refactor the linalg framework code for this?

Hephaestus12 avatar Mar 23 '20 13:03 Hephaestus12

hmm, I think you are confusing things. operator+ in linalg is from Eigen, which uses SIMD instructions where possible. result_eig = alpha * a_eig + beta * b_eig; is written with Eigen types. See the definition of EigenVectorXtMap in SGVector

gf712 avatar Mar 23 '20 13:03 gf712

Is there a method in linalg to add a dense vector to a sparse vector and/or add two sparse vectors?

Hephaestus12 avatar Mar 24 '20 23:03 Hephaestus12

Is there a method in linalg to add a dense vector to a sparse vector and/or add two sparse vectors? Or will we need to write one ourselves? @gf712

Hephaestus12 avatar Mar 24 '20 23:03 Hephaestus12

don't think there is atm.

karlnapf avatar Mar 25 '20 18:03 karlnapf