MLJBase.jl icon indicating copy to clipboard operation
MLJBase.jl copied to clipboard

Type stabilities in performance evaluation

Open OkonSamuel opened this issue 4 years ago • 4 comments

While checking the codebase for type instabilities i found out that type inference fails for Threads.@spawn and @distributed constructs in julia (Apparently that's the way these constructs were designed to be). Meaning julia isn't able to infer the return types hence affecting(slightly) performance. Previously i taught having a reduce(vcat, ret) infers the return type but checking with @code_warntype shows that this is done at runtime. Even though the return type when using these constructs can only be infered at runtime this isn't a problem if these constructs are used once. The problem arises when these constructs are called from other code e,g TunedModel fitting. The only solution to this to to create a type barrier i.e Annotate the return type of ret , thereby helping the compiler optimize subsequent code. Hence we need to know in advance the return type of [value(m, yhat, Xtest, ytest, wtest) for m in measures] from the func function at MLJBase. Is there any reason we won't know the return type? Another option is to anotate the type of measurements_matrix matrix since that's always a 2D array.

Also the evaluate! function is generally type un-stable even for CPU1 option. (it gives NamedTuple{(:measure, :measurement, :per_fold, :per_observation),_A} where _A<:Tuple) . This instability comes from predict(mach, X) which might be unavoidable cc @ablaom @tlienart

OkonSamuel avatar Jun 16 '20 19:06 OkonSamuel

This is great to have this review. However, generally models take far longer to train than they do do evaluate, and whence performance optimisation here has never been a priority for me personally.

FYI, most measures return a float, but this is not necessarily Float64. If GPU's are involved, chances are it is Float32, for example. Sometimes the measure is an Integer, as in true_positive. And there are wilder exceptions: One user wanted to view confusion_matrix as a measure, which we implemented - so in that case each measurement is a matrix. Given this, I expect refactoring the code to ensure type stability (in the CPU1 case) is probably tricky, although not impossible. (You may want to check the measures are indeed type-stable, a useful exercise.) I have no objections to someone trying - but it's not a personal priority.

ablaom avatar Jun 16 '20 20:06 ablaom

Now that you put it that way annotating with types won't be helpful as there as too many possibilities. For example a measures vector could consist of confusion_matrix, true_positive and cross_entropy. (In this case the compiler would treat the type of [value(m, yhat, Xtest, ytest, wtest) for m in measures]) as Any. The only place where type annotations would help is if the type of value(m, yhat, Xtest, ytest, wtest) was the same for all m in measures (The only way to know this is if the return type for each measure is somehow known). I guess this isn't top priority now

OkonSamuel avatar Jun 17 '20 05:06 OkonSamuel

By the way, I seem to recall reading somewhere on Discourse that you need to use @code_warntype optimize=true in order to get the actual inferred return types. But I might be misremembering and I can't find the actual reference right now. Here's what the docstring for @code_type says:

help?> @code_typed
  @code_typed

  Evaluates the arguments to the function or macro call, determines their types,
  and calls code_typed on the resulting expression. Use the optional argument optimize with

  @code_typed optimize=true foo(x)

  to control whether additional optimizations, such as inlining, are also applied.

But that doesn't make it clear to me whether the optimize option has any effect on the inferred types.

CameronBieganek avatar Jul 25 '20 17:07 CameronBieganek

By the way, I seem to recall reading somewhere on Discourse that you need to use @code_warntype optimize=true in order to get the actual inferred return types

By default optimize=true so there is no need to specify this again

OkonSamuel avatar Jul 26 '20 11:07 OkonSamuel