pynndescent icon indicating copy to clipboard operation
pynndescent copied to clipboard

Numba related error while using user-defined distance metric

Open jayaram-r opened this issue 6 years ago • 14 comments

Firstly, thanks for this very useful library implementing the NN-descent method. I am using pynndescent version 0.3.3 and these are the versions of some other relevant packages:

numpy - 1.16.2
numba - 0.46.0
scikit-learn - 0.20.3
scipy - 1.2.1
joblib - 0.14.0

The library works very well with predefined distance metrics, but I ran into an error while using it with my own distance metric. The distance metric has the required signature with one keyword argument and is JIT compiled using numba in the nopython mode. Copying the function below for reference:

import numpy as np
import numba

@numba.njit(fastmath=True)
def distance_angular_3tensors(x, y, shape=None):
    """
    Cosine angular distance between two 3rd order (rank 3) tensors.
    The tensors `x` and `y` should be flattened into 1D numpy arrays before calling this function.
    If `xt` and `yt` are the tensors, each of shape `shape`, they can be flattened into a 1D array using `x = xt.reshape(-1)` (and likewise for `yt`). The shape is passed as input, which is used by the function to reshape the arrays to tensors.

    :param x: numpy array of shape `(n, )` with the first flattened tensor.
    :param y: numpy array of shape `(n, )` with the second flattened tensor.
    :param shape: tuple of three values specifying the shape of the tensors. This is a required argument.

    :return: distance value which should be in the range [0, 1].
    """
    xt = x.reshape(shape)
    yt = y.reshape(shape)
    s = 0.
    for i in range(shape[0]):
        val1 = np.sum(xt[i, :, :] * yt[i, :, :])
        val2 = np.sum(xt[i, :, :] * xt[i, :, :]) ** 0.5
        val3 = np.sum(yt[i, :, :] * yt[i, :, :]) ** 0.5
        if val2 > 0. and val3 > 0.:
            s += (val1 / (val2 * val3))

    # Angular distance is the cosine-inverse of the average cosine similarity, divided by `pi` to normalize
    # the distance to the range `[0, 1]`
    s = max(-1., min(1., s / shape[0]))

    return np.arccos(s) / np.pi

I have tested it independently and it works as expected.

I used pynndescent.NNDescent to build a k-NN graph with this distance metric and there is no error while building the index. Here is some minimal code that shows these steps:

# Generate data
# tensor shape
shape = (3, 4, 4)
dim = shape[0] * shape[1] * shape[2]
N = 1000
N_test = 100
k = 5
data, labels = generate_data(N, dim)
data_test, labels_test = generate_data(N_test, dim)
# `data` and `data_test` are numpy arrays of shape `(N, dim)` and `(N_test, dim)` respectively.

# Distance metric and its kwargs
metric = distance_angular_3tensors
metric_kwds = {'shape': shape}

# Construct the ANN index
params = {
    'metric': metric,
    'metric_kwds': metric_kwds,
    'n_neighbors': 20,
    'rho': 0.5,
    'random_state': 123, 
    'n_jobs': -1, 
    'verbose': True
}
index = NNDescent(data, **params)

The above code runs successfully and builds the k-NN index. But when I call the query method (as follows) I run into an error.

# Query the ANN index on the test data
nn_indices, _ = index.query(data_test, k=k)

Here is the error log as a text file: pynndescent_issue_error_log.txt

To provide some additional information:

  • I am running this in a Conda environment. I have tried a couple of lower versions of numba (0.43.1 and 0.40.1) and they all ran into this same error.
  • I have checked the existing issues on this library and also did some searching on Google about this error, without much success.
  • I did a code walk through the traceback to see if I could spot the cause for the error, but did not get anywhere.
  • I modified the keyword argument shape of the distance metric function distance_angular_3tensors to be a string instead of tuple thinking numba may have some issue with tuple argument. However, I ran into this same error.
  • There is an error with user-defined distance metrics only when the metric takes keyword argument(s). That is the NNDescent class is given a metric_kwds argument. There is no error if my custom distance metric does not take any keyword arguments.

Let me know if I can provide any additional information. Thanks in advance.

jayaram-r avatar Nov 18 '19 17:11 jayaram-r

Pasting the error below for convenience:

TypeError Traceback (most recent call last) /anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/errors.py in new_error_context(fmt_, *args, **kwargs) 716 try: --> 717 yield 718 except NumbaError as e:

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/lowering.py in lower_block(self, block) 259 loc=self.loc, errcls_=defaulterrcls): --> 260 self.lower_inst(inst) 261

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/lowering.py in lower_inst(self, inst) 413 if isinstance(inst, _class): --> 414 func(self, inst) 415 return

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/npyufunc/parfor.py in _lower_parfor_parallel(lowerer, parfor) 282 index_var_typ, --> 283 parfor.races) 284 if config.DEBUG_ARRAY_OPT:

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/npyufunc/parfor.py in call_parallel_gufunc(lowerer, cres, gu_signature, outer_sig, expr_args, expr_arg_types, loop_ranges, redvars, reddict, redarrdict, init_block, index_var_typ, races) 1198 info = build_gufunc_wrapper(llvm_func, cres, sin, sout, -> 1199 cache=False, is_parfors=True) 1200 wrapper_name = info.name

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/npyufunc/parallel.py in build_gufunc_wrapper(py_func, cres, sin, sout, cache, is_parfors) 249 innerinfo = ufuncbuilder.build_gufunc_wrapper( --> 250 py_func, cres, sin, sout, cache=cache, is_parfors=is_parfors, 251 )

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/npyufunc/wrappers.py in build_gufunc_wrapper(py_func, cres, sin, sout, cache, is_parfors) 502 return wrapcls( --> 503 py_func, cres, sin, sout, cache, is_parfors=is_parfors, 504 ).build()

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/compiler_lock.py in _acquire_compile_lock(*args, **kwargs) 31 with self: ---> 32 return func(*args, **kwargs) 33 return _acquire_compile_lock

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/npyufunc/wrappers.py in build(self) 455 wrapper_name = "gufunc." + self.fndesc.mangled_name --> 456 wrapperlib = self._compile_wrapper(wrapper_name) 457 return _wrapper_info(

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/npyufunc/wrappers.py in _compile_wrapper(self, wrapper_name) 434 # Build wrapper --> 435 self._build_wrapper(wrapperlib, wrapper_name) 436 # Non-parfors?

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/npyufunc/wrappers.py in _build_wrapper(self, library, name) 400 ary = GUArrayArg(self.context, builder, arg_args, --> 401 arg_steps, i, step_offset, typ, sym, sym_dim) 402 step_offset += len(sym)

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/npyufunc/wrappers.py in init(self, context, builder, args, steps, i, step_offset, typ, syms, sym_dim) 657 raise TypeError("scalar type {0} given for non scalar " --> 658 "argument #{1}".format(typ, i + 1)) 659 self._loader = _ScalarArgLoader(dtype=typ, stride=core_step)

TypeError: scalar type tuple(tuple(int64 x 3) x 1) given for non scalar argument #4

During handling of the above exception, another exception occurred:

LoweringError Traceback (most recent call last) in 1 # Query the ANN index on the test data ----> 2 nn_indices, _ = index.query(data_test, k=k)

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/pynndescent/pynndescent_.py in query(self, query_data, k, queue_size) 901 query_data, 902 self._distance_func, --> 903 self._dist_args, 904 ) 905 else:

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/dispatcher.py in _compile_for_args(self, *args, **kws) 418 e.patch_message('\n'.join((str(e).rstrip(), help_msg))) 419 # ignore the FULL_TRACEBACKS config, this needs reporting! --> 420 raise e 421 422 def inspect_llvm(self, signature=None):

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/dispatcher.py in _compile_for_args(self, *args, **kws) 351 argtypes.append(self.typeof_pyval(a)) 352 try: --> 353 return self.compile(tuple(argtypes)) 354 except errors.ForceLiteralArg as e: 355 # Received request for compiler re-entry with the list of arguments

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/compiler_lock.py in _acquire_compile_lock(*args, **kwargs) 30 def _acquire_compile_lock(*args, **kwargs): 31 with self: ---> 32 return func(*args, **kwargs) 33 return _acquire_compile_lock 34

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/dispatcher.py in compile(self, sig) 766 self._cache_misses[sig] += 1 767 try: --> 768 cres = self._compiler.compile(args, return_type) 769 except errors.ForceLiteralArg as e: 770 def folded(args, kws):

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/dispatcher.py in compile(self, args, return_type) 75 76 def compile(self, args, return_type): ---> 77 status, retval = self._compile_cached(args, return_type) 78 if status: 79 return retval

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/dispatcher.py in _compile_cached(self, args, return_type) 89 90 try: ---> 91 retval = self._compile_core(args, return_type) 92 except errors.TypingError as e: 93 self._failed_cache[key] = e

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/dispatcher.py in _compile_core(self, args, return_type) 107 args=args, return_type=return_type, 108 flags=flags, locals=self.locals, --> 109 pipeline_class=self.pipeline_class) 110 # Check typing error if object mode is used 111 if cres.typing_error is not None and not flags.enable_pyobject:

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/compiler.py in compile_extra(typingctx, targetctx, func, args, return_type, flags, locals, library, pipeline_class) 526 pipeline = pipeline_class(typingctx, targetctx, library, 527 args, return_type, flags, locals) --> 528 return pipeline.compile_extra(func) 529 530

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/compiler.py in compile_extra(self, func) 324 self.state.lifted = () 325 self.state.lifted_from = None --> 326 return self._compile_bytecode() 327 328 def compile_ir(self, func_ir, lifted=(), lifted_from=None):

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/compiler.py in _compile_bytecode(self) 383 """ 384 assert self.state.func_ir is None --> 385 return self._compile_core() 386 387 def _compile_ir(self):

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/compiler.py in _compile_core(self) 363 self.state.status.fail_reason = e 364 if is_final_pipeline: --> 365 raise e 366 else: 367 raise CompilerError("All available pipelines exhausted")

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/compiler.py in _compile_core(self) 354 res = None 355 try: --> 356 pm.run(self.state) 357 if self.state.cr is not None: 358 break

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/compiler_machinery.py in run(self, state) 326 (self.pipeline_name, pass_desc) 327 patched_exception = self._patch_error(msg, e) --> 328 raise patched_exception 329 330 def dependency_analysis(self):

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/compiler_machinery.py in run(self, state) 317 pass_inst = _pass_registry.get(pss).pass_inst 318 if isinstance(pass_inst, CompilerPass): --> 319 self._runPass(idx, pass_inst, state) 320 else: 321 raise BaseException("Legacy pass in use")

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/compiler_lock.py in _acquire_compile_lock(*args, **kwargs) 30 def _acquire_compile_lock(*args, **kwargs): 31 with self: ---> 32 return func(*args, **kwargs) 33 return _acquire_compile_lock 34

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/compiler_machinery.py in _runPass(self, index, pss, internal_state) 279 mutated |= check(pss.run_initialization, internal_state) 280 with SimpleTimer() as pass_time: --> 281 mutated |= check(pss.run_pass, internal_state) 282 with SimpleTimer() as finalize_time: 283 mutated |= check(pss.run_finalizer, internal_state)

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/compiler_machinery.py in check(func, compiler_state) 266 267 def check(func, compiler_state): --> 268 mangled = func(compiler_state) 269 if mangled not in (True, False): 270 msg = ("CompilerPass implementations should return True/False. "

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/typed_passes.py in run_pass(self, state) 378 state.library.enable_object_caching() 379 --> 380 NativeLowering().run_pass(state) # TODO: Pull this out into the pipeline 381 lowered = state['cr'] 382 signature = typing.signature(state.return_type, *state.args)

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/typed_passes.py in run_pass(self, state) 323 lower = lowering.Lower(targetctx, library, fndesc, interp, 324 metadata=metadata) --> 325 lower.lower() 326 if not flags.no_cpython_wrapper: 327 lower.create_cpython_wrapper(flags.release_gil)

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/lowering.py in lower(self) 177 if self.generator_info is None: 178 self.genlower = None --> 179 self.lower_normal_function(self.fndesc) 180 else: 181 self.genlower = self.GeneratorLower(self)

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/lowering.py in lower_normal_function(self, fndesc) 218 # Init argument values 219 self.extract_function_arguments() --> 220 entry_block_tail = self.lower_function_body() 221 222 # Close tail of entry block

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/lowering.py in lower_function_body(self) 243 bb = self.blkmap[offset] 244 self.builder.position_at_end(bb) --> 245 self.lower_block(block) 246 247 self.post_lower()

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/lowering.py in lower_block(self, block) 258 with new_error_context('lowering "{inst}" at {loc}', inst=inst, 259 loc=self.loc, errcls_=defaulterrcls): --> 260 self.lower_inst(inst) 261 262 def create_cpython_wrapper(self, release_gil=False):

/anaconda3/envs/knn_expts/lib/python3.7/contextlib.py in exit(self, type, value, traceback) 128 value = type() 129 try: --> 130 self.gen.throw(type, value, traceback) 131 except StopIteration as exc: 132 # Suppress StopIteration unless it's the same exception that

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/errors.py in new_error_context(fmt_, *args, **kwargs) 723 from numba import config 724 tb = sys.exc_info()[2] if config.FULL_TRACEBACKS else None --> 725 six.reraise(type(newerr), newerr, tb) 726 727

/anaconda3/envs/knn_expts/lib/python3.7/site-packages/numba/six.py in reraise(tp, value, tb) 667 if value.traceback is not tb: 668 raise value.with_traceback(tb) --> 669 raise value 670 671 else:

LoweringError: Failed in nopython mode pipeline (step: nopython mode backend) scalar type tuple(tuple(int64 x 3) x 1) given for non scalar argument #4

File "../../../../../../anaconda3/envs/knn_expts/lib/python3.7/site-packages/pynndescent/pynndescent_.py", line 93: def initialized_nnd_search(

for i in numba.prange(query_points.shape[0]):
^

[1] During: lowering "id=1[LoopNest(index_variable = parfor_index.280, range = (0, $2.6, 1))]{116: <ir.Block at /anaconda3/envs/knn_expts/lib/python3.7/site-packages/pynndescent/pynndescent_.py (108)>, 60: <ir.Block at /anaconda3/envs/knn_expts/lib/python3.7/site-packages/pynndescent/pynndescent_.py (105)>, 100: <ir.Block at /anaconda3/envs/knn_expts/lib/python3.7/site-packages/pynndescent/pynndescent_.py (105)>, 40: <ir.Block at /anaconda3/envs/knn_expts/lib/python3.7/site-packages/pynndescent/pynndescent_.py (97)>, 128: <ir.Block at /anaconda3/envs/knn_expts/lib/python3.7/site-packages/pynndescent/pynndescent_.py (109)>, 142: <ir.Block at /anaconda3/envs/knn_expts/lib/python3.7/site-packages/pynndescent/pynndescent_.py (105)>, 140: <ir.Block at /anaconda3/envs/knn_expts/lib/python3.7/site-packages/pynndescent/pynndescent_.py (111)>, 102: <ir.Block at /anaconda3/envs/knn_expts/lib/python3.7/site-packages/pynndescent/pynndescent_.py (107)>, 20: <ir.Block at /anaconda3/envs/knn_expts/lib/python3.7/site-packages/pynndescent/pynndescent_.py (93)>, 206: <ir.Block at /anaconda3/envs/knn_expts/lib/python3.7/site-packages/pynndescent/pynndescent_.py (114)>, 214: <ir.Block at /anaconda3/envs/knn_expts/lib/python3.7/site-packages/pynndescent/pynndescent_.py (102)>, 58: <ir.Block at /anaconda3/envs/knn_expts/lib/python3.7/site-packages/pynndescent/pynndescent_.py (103)>}Var(parfor_index.280, /anaconda3/envs/knn_expts/lib/python3.7/site-packages/pynndescent/pynndescent_.py (93))" at /anaconda3/envs/knn_expts/lib/python3.7/site-packages/pynndescent/pynndescent_.py (93)


This should not have happened, a problem has occurred in Numba's internals. You are currently using Numba version 0.46.0.

Please report the error message and traceback, along with a minimal reproducer at: https://github.com/numba/numba/issues/new

If more help is needed please feel free to speak to the Numba core developers directly at: https://gitter.im/numba/numba

Thanks in advance for your help in improving Numba!

jayaram-r avatar Nov 18 '19 17:11 jayaram-r

It looks as if numba is having some issues. One option would be to try the new_search branch and see if the problem is still present there (hopefully it isn't).

lmcinnes avatar Nov 18 '19 21:11 lmcinnes

Let me try that. Thanks.

jayaram-r avatar Nov 18 '19 22:11 jayaram-r

No luck. I am running into the same error using the branch new_search.

jayaram-r avatar Nov 19 '19 00:11 jayaram-r

That's more disconcerting. I don't quite know what the issue could be and unfortunately I am a little busy right now and don't have time to look at it at the moment. Can I get back to you in a week or two?

lmcinnes avatar Nov 20 '19 16:11 lmcinnes

No worries. I'll dig into it when I get some time as well. Thanks.

jayaram-r avatar Nov 20 '19 17:11 jayaram-r

Additional clarification: There is an error with user-defined distance metrics only when the metric takes keyword argument(s). That is the NNDescent class is given a metric_kwds argument.

There is no error if my custom distance metric does not take any keyword arguments. I added this to the main issue description.

jayaram-r avatar Nov 20 '19 22:11 jayaram-r

I think the issue here is possibly with numba inferring the type of the keyword arguments to your distance function -- I have certainly had some issues with that in the past. Instead of defaulting to None you may want to instead default to a dummy argument of the type you want to pass.

lmcinnes avatar Nov 26 '19 04:11 lmcinnes

That's a good suggestion. I modified the default value for the keyword argument shape from None to the tuple (1, 1, 1). That leads to the same error unfortunately.

jayaram-r avatar Nov 26 '19 17:11 jayaram-r

I have given it some thought, but I admit I am at a bit of a loss. What sort of error does the new_search branch give? It at least won't fail on parallelisation and prange.

lmcinnes avatar Nov 28 '19 14:11 lmcinnes

I just gave the latest version 0.4.2 a try. Unfortunately, I am running into the same (or very similar) error when I use a custom distance metric with a keyword argument. This is the key error message

LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
scalar type tuple(tuple(int64 x 3) x 1) given for non scalar argument #4

I understand that this is very unusual; seems like a deeper numba related error.

jayaram-r avatar Jan 03 '20 20:01 jayaram-r

Some unrelated questions:

  1. Is it alright to continue using version 0.3.3 (i.e. no known bugs)?
  2. I noticed that the NNDescent class API has changed a bit in version 0.4.2. Would you suggest tweaking the parameters pruning_degree_multiplier and diversify_epsilon? What about the parameter epsilon used by the query method? Thanks.

jayaram-r avatar Jan 03 '20 21:01 jayaram-r

0.3.3 is fine if it works for you. There are some significant performance advantages to 0.4.2, and in particular the querying is a lot faster. And yes, the API changed quite a bit. In terms of what to tweak -- I would look to the epsilon parameter first. Values in the range 0.0 to 0.3 are probably good. If you aren't getting the accuracy you want then you can tweak n_neighbors, pruning_degree_multiplier and diversify_epsilon.

lmcinnes avatar Jan 04 '20 19:01 lmcinnes

Got it. Thanks.

jayaram-r avatar Jan 04 '20 23:01 jayaram-r