SGVector cleanup.
Remove all the methods producing clutter in SGVector. This includes (although it is not limited to) math operations like linspace, dot, and other operations like find. The objective is to have an SGVector as lean as possible.
Systematically, the work to be done for each method reduced is:
- Move the method from
SGVectorto another class, like for instance toCMathor to whichever it makes sense the most. - Refactor Shogun's internal code.
- Refactor examples (libshogun and interface examples, if any).
- In case the method already had unit tests in
SGVector, then refactor them (and move them). Otherwise, write some good unit tests.
See this pull request to gain understanding of what this task consists of, #2579.
I forgot to mention that it is important to do this separately, both to ease the review and the cleanup process. That means that the best practice would be to send small and individual pull requests for each method.
See the wiki for more motivation why we want to do this.
@iglesias, I would like to work on this. Shall I start by moving dot to CMath? Should I do more changes in the first pull request?
According to what I wrote above, moving dot to CMath sounds good :-)
I think that doing that one alone in a first pull request is good.
@iglesias I'll leave the functions related to linear algebra (we might want to move them to linalg). So I'll leave functions like add, sum, norm, add_scalar, product, vector_multiply etc for now while we are working on linalg NATIVE implementations. Should I meanwhile move functions like max, min, mean, arg_max, arg_min, max_abs to CMath? Don't know where to put linspace and sorting functions.
Sure. In my opinion, max, min, argmax, argmin, and linspace make sense in CMath. Mean might make more sense in CStatistics. About sorting I am not sure, I leave it up to you :-) But there might be already some sorting implemented in CMath (iirc).
I am a bit confused. As far as I know, the declaration and definition of a template should be kept in one single file, but if we take the case of SGVector, then it has a header file and a cpp file. How is this working out?
We do a small trick. Look at the end of SGVector.cpp. There, we have defined the classes for which SGVector can be templated (which are basically primitive data types).
There is SGVector::linspace_vec() which returns a vector, SGVector::linspace() which returns a float64_t * and depends on CMath::linspace which returns void (passes the output through an input parameter pointer). None of them are used anywhere apart from SGVector and CMath. What to do with these?
If they are not used anywhere, I'd say we can drop them. We can for sure drop the ones in SGVector. The CMath one can stay.
There are three functions related to sorting in SGVector: qsort(), argsort() and is_sorted()
qsort() looks like this
template <class T>
void SGVector<T>::qsort()
{
CMath::qsort<T>(vector, vlen);
}
So if we move qsort to CMath then we will have to pass the vector as an argument or we can remove SGVector<T>::qsort entirely and use the one mentioned in the above patch but in this case we'll need to pass the length as well.
Should I also move argsort? (moving is_sorted() doesn't make sense to me)
In my opinion, void CMath::qsort(SGVector<T>), SGVector<index_t> CMath::argsort(SGVector), and bool CMath::is_sorted(SGVector<T>) make sense.
@iglesias would it be fine , if I moved range_fill to CMath ?
@curiousguy13, it sounds good to me, both range_fill and range_fill_vector should go out of SGVector. Do something nice with them (perhaps, simplifying to only one method) and move to CMath. Keep in mind unit tests. Looking forward to see the pull request.
@iglesias I think range_fill, zeros, these are perfect to be shifted to linalg instead.
It sounds good! On 18 Feb 2015 10:26, "Soumyajit De" [email protected] wrote:
@iglesias https://github.com/iglesias I think range_fill, zeros, these are perfect to be shifted to linalg instead.
— Reply to this email directly or view it on GitHub https://github.com/shogun-toolbox/shogun/issues/2582#issuecomment-74835157 .
I would like to work on this. Shall I start by moving add to CMath?
I would like to work on this. Shall I start by moving add to CMath?
You shouldn’t move to CMath but to linalg instead. However linalg already has add. But it would be good to remove add from SGVector. AFAIK CMath (now Math) should be dropped.
This is for the method add() in SGVector. I don't think simply removing it and replacing its calls with the add method in the Linalg framework will help. For the following reasons:
In file: src/shogun/mathematics/linalg/backend/eigen/BasicOps.cpp :
#define BACKEND_GENERIC_IN_PLACE_ADD(Type, Container) \
void LinalgBackendEigen::add( \
const Container<Type>& a, const Container<Type>& b, Type alpha, \
Type beta, Container<Type>& result) const \
{ \
add_impl(a, b, alpha, beta, result); \
}
add() calls add_impl()
In file: src/shogun/mathematics/linalg/backend/eigen/BasicOps.cpp :
template <typename T>
void LinalgBackendEigen::add_impl(
const SGVector<T>& a, const SGVector<T>& b, T alpha, T beta,
SGVector<T>& result) const
{
typename SGVector<T>::EigenVectorXtMap a_eig = a;
typename SGVector<T>::EigenVectorXtMap b_eig = b;
typename SGVector<T>::EigenVectorXtMap result_eig = result;
result_eig = alpha * a_eig + beta * b_eig;
}
add_impl() uses the operator+
In file: src/shogun/lib/SGVector.cpp :
/** addition operator */
template<class T>
SGVector<T> SGVector<T>::operator+ (SGVector<T> x)
{
assert_on_cpu();
require(x.vector && vector, "Addition possible for only non-null vectors.");
require(x.vlen == vlen, "Length of the two vectors to be added should be same. [V({}) + V({})]", vlen, x.vlen);
SGVector<T> result=clone();
result.add(x);
return result;
}
The operator+ uses SGVector's add() method.
So indirectly, even linalg's add() method uses SGVector's add() method.
Will we need to entirely refactor the linalg framework code for this?
hmm, I think you are confusing things. operator+ in linalg is from Eigen, which uses SIMD instructions where possible.
result_eig = alpha * a_eig + beta * b_eig; is written with Eigen types. See the definition of EigenVectorXtMap in SGVector
Is there a method in linalg to add a dense vector to a sparse vector and/or add two sparse vectors?
Is there a method in linalg to add a dense vector to a sparse vector and/or add two sparse vectors? Or will we need to write one ourselves? @gf712
don't think there is atm.