numba-scipy icon indicating copy to clipboard operation
numba-scipy copied to clipboard

numba-scipy is now accepting PRs, discuss what to focus on first!

Open stuartarchibald opened this issue 5 years ago • 35 comments

As title... place holder ticket for discussion over what to focus on first. scipy.special and scipy.sparse were suggested here: https://github.com/numba/numba/issues/4292#issuecomment-510257368

stuartarchibald avatar Aug 09 '19 14:08 stuartarchibald

I don't think I can help much with scipy.special (C integration is not my strong suit) but once scipy.special is available I will look into bringing in the statistical distributions

luk-f-a avatar Aug 09 '19 14:08 luk-f-a

I would love to see scipy.integrate :)

astrojuanlu avatar Aug 09 '19 15:08 astrojuanlu

I can do scipy.special. @Juanlu001 what parts of integrate are you most interested in seeing? Something like quad would be a royal pain, but if we’re talking e.g. Gaussian quadrature that’s much more approachable.

person142 avatar Aug 09 '19 16:08 person142

(When it comes to adaptive quadrature it would probably be good to do some work on the SciPy side first vs. reproducing quadpack here.)

person142 avatar Aug 09 '19 16:08 person142

@person142 I agree, unless there's going to be some incredible performance (or other) benefit gained by translating core numerical libraries to Numba then creating a clean API on the callee side to make it easy for Numba make make the call is preferable.

stuartarchibald avatar Aug 09 '19 16:08 stuartarchibald

@person142 I was thinking more of accelerating the new Initial Value Problem solvers introduced in SciPy 1.0, which are much more feature-rich than odeint and ode but have a performance overhead because they're written in pure Python. I acknowledge though that it's not a trivial problem, but the question was open so... :innocent:

astrojuanlu avatar Aug 09 '19 16:08 astrojuanlu

@person142 if you are happy to do scipy.special that'd be great, thanks. There's a tiny module in numba-scipy already for this, I was using it for CI and to get something uploaded to PyPI, feel free to wipe it with actual contents!

stuartarchibald avatar Aug 09 '19 16:08 stuartarchibald

I was thinking more of accelerating the new Initial Value Problem solvers introduced in SciPy 1.0, which are much more feature-rich than odeint and ode but have a performance overhead because they're written in pure Python

Hm that is a quite interesting case. Could be that we should work on moving some stuff into Cython on the SciPy side to reduce overhead and then make sure it integrates well with Numba jitted functions...

person142 avatar Aug 09 '19 16:08 person142

@luk-f-a "Basic" support for scipy.special is added in the now merged https://github.com/numba/numba-scipy/pull/13 if you are interested in writing statistical distribution implementations? Thanks.

stuartarchibald avatar Aug 26 '19 13:08 stuartarchibald

thanks for letting me know @stuartarchibald . I started looking at it and I have a general question: should we aim to preserve the structure of the Scipy code?

For example, for distributions I would need one auxiliary function from scipy/_lib/_util.py. Should I create a numba-scipy/_lib/_util.py file and put the jit-version in there?

luk-f-a avatar Aug 28 '19 10:08 luk-f-a

@luk-f-a good point, perhaps we try with this approach and see how it goes ? There may be issues tracking upstream refactoring/places where it's just not reasonable to do this, however, it's probably worth a try as a predefined order might help.

stuartarchibald avatar Aug 28 '19 16:08 stuartarchibald

Hello I am following the discussion with great interest and would like to ask how does it work exactly.

  1. Do I need to import the package like import numba-scipy?
  2. Does it work with scipy.stats? for example scipyp.stats.chi2.rvs() scipy.stats.t.cdf() sp.stats.t.ppf()
  3. If not do you suggest any other method to speed up scipy functions (especially inversion of student t is very slow)
  4. Where can I find information of how to help for integration between numba and scipy? I would like to contribute

many thanks

gioxc88 avatar Sep 13 '19 09:09 gioxc88

hi @gioxc88 , it's great to hear that you're interested and even considering contributing. I started some work on stats, building norm.rvs as an example to test out the approach. But this has not been merged yet, so we're in a very early stage!!

to your questions:

  1. the idea is that you won't need to, Numba will recognize the calls to Scipy and call numba-scipy in the same way that it does with Numpy.
  2. it will eventually, but which distribution works will depend on the community interest and contributions. There are 120 distributions in total, it will take some time. We're looking into how adding new distributions could be simplified.
  3. Is your problem the time that t.ppf takes for each calculation, or that you might be calling it many times?
  4. Stay tuned on this issue for now, I assume that over time there'll be contributing guidelines. In particular, for stats I'd like to have a nice tutorial on adding new distributions.

ping @stuartarchibald

luk-f-a avatar Sep 13 '19 10:09 luk-f-a

hello @luk-f-a thank you for your answer. I relied on the community for too long and now I think it's time for me to contribute.

I will certainly follow this issue.

About the t.ppf I need to call it in a loop 5 million times, for an a input array that has shape [65, 13] This is because if I call it for a 3d array of shape [65, 13, 5000000] if won't fit my RAM. But I tried with a 64Gb RAM pc to call it one single using the big 3d array and it's still extremely slow compared do t.cdf. And when I say extremely slow I am talking about t.cdf being executed in like 1 minute while the t.ppf takes more than 30 mins.

gioxc88 avatar Sep 13 '19 10:09 gioxc88

I relied on the community for too long and now I think it's time for me to contribute.

same here!! :smile:

I'm not surprised ppf is slower than cdf. I'm not sure about t.ppf in particular but it probably relies on numerical root finding of cdf. That means cdf might be run many times for each ppf call. The Scipy code seems to be in Cython (maybe we need a Scipy dev to confirm) so it's possible Numba might not improve performance. That array is pretty big (4bn numbers) anyway. For comparison, I ran the equivalent of t.ppf in Julia, and it took 1sec per million. That would mean 70 minutes for your big array. You might need a special algorithm (maybe gaussian quadrature?) if you really need to speed it up. Otherwise, create the numbers once, and read them from disk every time you need them.

luk-f-a avatar Sep 13 '19 13:09 luk-f-a

Hi, what about the Bessel functions, can I use it with numba?

kkingstoun avatar Sep 13 '19 17:09 kkingstoun

hi @kkingstoun , I did a quick test with jvand it worked.

from scipy import special as sp
import numba

@numba.njit
def test(v, z):
    return sp.jv(v, z)

print(sp.jv(3, 3), test(3., 3.))

There's a catch: it seems that you have to always pass the arguments as floats.

luk-f-a avatar Sep 14 '19 13:09 luk-f-a

Hi @luk-f-a, thank you for your response, but on my computer it doesn't work:

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Unknown attribute 'jv' of type Module(<module 'scipy.special' from '/home/kking/anaconda3/lib/python3.7/site-packages/scipy/special/__init__.py'>)

File "<ipython-input-332-f5add8b419a3>", line 6:
def test(v, z):
    return sp.jv(v, z)
    ^

[1] During: typing of get attribute at <ipython-input-332-f5add8b419a3> (6)

File "<ipython-input-332-f5add8b419a3>", line 6:
def test(v, z):
    return sp.jv(v, z)
    ^

Numba version: 0.45.1 Scipy: 1.3.0

Edit, the same on windows:


TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Unknown attribute 'jv' of type Module(<module 'scipy.special' from 'C:\\Users\\Mateusz\\Anaconda3.7.4\\lib\\site-packages\\scipy\\special\\__init__.py'>)

File "<ipython-input-6-423efa62ec3d>", line 6:
def test(v, z):
    return sp.jv(v, z)
    ^

[1] During: typing of get attribute at <ipython-input-6-423efa62ec3d> (6)

File "<ipython-input-6-423efa62ec3d>", line 6:
def test(v, z):
    return sp.jv(v, z)

kkingstoun avatar Sep 14 '19 14:09 kkingstoun

what happens if you run from numba_scipy import special as nsp_sp?

luk-f-a avatar Sep 14 '19 14:09 luk-f-a

@luk-f-a

This line works, but:


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-15-ce32addfc6b4> in <module>
      1 from numba_scipy import special as nsp_sp
----> 2 nsp_sp.jv(10)

AttributeError: module 'numba_scipy.special' has no attribute 'jv'


kkingstoun avatar Sep 14 '19 14:09 kkingstoun

do you have the latest numba-scipy master? Is special a .py file or a folder?

luk-f-a avatar Sep 14 '19 14:09 luk-f-a

@luk-f-a Yes, I see now, that there is a problem with installation. I dont'see the special directory, even after reinstallation using pip straight from the git.

How should I install correctly?

To test it I just copied whole folder to the my project folder, and

import numba_special2.special as nps2
nps2.__dict__
{'__name__': 'numba_special2.special',
 '__doc__': None,
 '__package__': 'numba_special2.special',
 '__loader__': <_frozen_importlib_external.SourceFileLoader at 0x7f95b430fe48>,
 '__spec__': ModuleSpec(name='numba_special2.special', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7f95b430fe48>, origin='/home/users/kkingstoun/Projekty/Python/MMPP/Bi-stable/numba_special2/special/__init__.py', submodule_search_locations=['/home/users/kkingstoun/Projekty/Python/MMPP/Bi-stable/numba_special2/special']),
 '__path__': ['/home/users/kkingstoun/Projekty/Python/MMPP/Bi-stable/numba_special2/special'],
 '__file__': '/home/users/kkingstoun/Projekty/Python/MMPP/Bi-stable/numba_special2/special/__init__.py',
 '__cached__': '/home/users/kkingstoun/Projekty/Python/MMPP/Bi-stable/numba_special2/special/__pycache__/__init__.cpython-37.pyc',
 '__builtins__': {'__name__': 'builtins',
  '__doc__': "Built-in functions, exceptions, and other objects.\n\nNoteworthy: None is the `nil' object; Ellipsis represents `...' in slices.",
  '__package__': '',
  '__loader__': _frozen_importlib.BuiltinImporter,
  '__spec__': ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>),
  '__build_class__': <function __build_class__>,
  '__import__': <function __import__>,
  'abs': <function abs(x, /)>,
  'all': <function all(iterable, /)>,
  'any': <function any(iterable, /)>,
  'ascii': <function ascii(obj, /)>,
  'bin': <function bin(number, /)>,
  'breakpoint': <function breakpoint>,
  'callable': <function callable(obj, /)>,
  'chr': <function chr(i, /)>,
  'compile': <function compile(source, filename, mode, flags=0, dont_inherit=False, optimize=-1)>,
  'delattr': <function delattr(obj, name, /)>,
  'dir': <function dir>,
  'divmod': <function divmod(x, y, /)>,
  'eval': <function eval(source, globals=None, locals=None, /)>,
  'exec': <function exec(source, globals=None, locals=None, /)>,
  'format': <function format(value, format_spec='', /)>,
  'getattr': <function getattr>,
  'globals': <function globals()>,
  'hasattr': <function hasattr(obj, name, /)>,
  'hash': <function hash(obj, /)>,
  'hex': <function hex(number, /)>,
  'id': <function id(obj, /)>,
  'input': <bound method Kernel.raw_input of <ipykernel.ipkernel.IPythonKernel object at 0x7f95b7c3aa90>>,
  'isinstance': <function isinstance(obj, class_or_tuple, /)>,
  'issubclass': <function issubclass(cls, class_or_tuple, /)>,
  'iter': <function iter>,
  'len': <function len(obj, /)>,
  'locals': <function locals()>,
  'max': <function max>,
  'min': <function min>,
  'next': <function next>,
  'oct': <function oct(number, /)>,
  'ord': <function ord(c, /)>,
  'pow': <function pow(x, y, z=None, /)>,
  'print': <function print>,
  'repr': <function repr(obj, /)>,
  'round': <function round(number, ndigits=None)>,
  'setattr': <function setattr(obj, name, value, /)>,
  'sorted': <function sorted(iterable, /, *, key=None, reverse=False)>,
  'sum': <function sum(iterable, start=0, /)>,
  'vars': <function vars>,
  'None': None,
  'Ellipsis': Ellipsis,
  'NotImplemented': NotImplemented,
  'False': False,
  'True': True,
  'bool': bool,
  'memoryview': memoryview,
  'bytearray': bytearray,
  'bytes': bytes,
  'classmethod': classmethod,
  'complex': complex,
  'dict': dict,
  'enumerate': enumerate,
  'filter': filter,
  'float': float,
  'frozenset': frozenset,
  'property': property,
  'int': int,
  'list': list,
  'map': map,
  'object': object,
  'range': range,
  'reversed': reversed,
  'set': set,
  'slice': slice,
  'staticmethod': staticmethod,
  'str': str,
  'super': super,
  'tuple': tuple,
  'type': type,
  'zip': zip,
  '__debug__': True,
  'BaseException': BaseException,
  'Exception': Exception,
  'TypeError': TypeError,
  'StopAsyncIteration': StopAsyncIteration,
  'StopIteration': StopIteration,
  'GeneratorExit': GeneratorExit,
  'SystemExit': SystemExit,
  'KeyboardInterrupt': KeyboardInterrupt,
  'ImportError': ImportError,
  'ModuleNotFoundError': ModuleNotFoundError,
  'OSError': OSError,
  'EnvironmentError': OSError,
  'IOError': OSError,
  'EOFError': EOFError,
  'RuntimeError': RuntimeError,
  'RecursionError': RecursionError,
  'NotImplementedError': NotImplementedError,
  'NameError': NameError,
  'UnboundLocalError': UnboundLocalError,
  'AttributeError': AttributeError,
  'SyntaxError': SyntaxError,
  'IndentationError': IndentationError,
  'TabError': TabError,
  'LookupError': LookupError,
  'IndexError': IndexError,
  'KeyError': KeyError,
  'ValueError': ValueError,
  'UnicodeError': UnicodeError,
  'UnicodeEncodeError': UnicodeEncodeError,
  'UnicodeDecodeError': UnicodeDecodeError,
  'UnicodeTranslateError': UnicodeTranslateError,
  'AssertionError': AssertionError,
  'ArithmeticError': ArithmeticError,
  'FloatingPointError': FloatingPointError,
  'OverflowError': OverflowError,
  'ZeroDivisionError': ZeroDivisionError,
  'SystemError': SystemError,
  'ReferenceError': ReferenceError,
  'MemoryError': MemoryError,
  'BufferError': BufferError,
  'Warning': Warning,
  'UserWarning': UserWarning,
  'DeprecationWarning': DeprecationWarning,
  'PendingDeprecationWarning': PendingDeprecationWarning,
  'SyntaxWarning': SyntaxWarning,
  'RuntimeWarning': RuntimeWarning,
  'FutureWarning': FutureWarning,
  'ImportWarning': ImportWarning,
  'UnicodeWarning': UnicodeWarning,
  'BytesWarning': BytesWarning,
  'ResourceWarning': ResourceWarning,
  'ConnectionError': ConnectionError,
  'BlockingIOError': BlockingIOError,
  'BrokenPipeError': BrokenPipeError,
  'ChildProcessError': ChildProcessError,
  'ConnectionAbortedError': ConnectionAbortedError,
  'ConnectionRefusedError': ConnectionRefusedError,
  'ConnectionResetError': ConnectionResetError,
  'FileExistsError': FileExistsError,
  'FileNotFoundError': FileNotFoundError,
  'IsADirectoryError': IsADirectoryError,
  'NotADirectoryError': NotADirectoryError,
  'InterruptedError': InterruptedError,
  'PermissionError': PermissionError,
  'ProcessLookupError': ProcessLookupError,
  'TimeoutError': TimeoutError,
  'open': <function io.open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)>,
  'copyright': Copyright (c) 2001-2019 Python Software Foundation.
  All Rights Reserved.
  
  Copyright (c) 2000 BeOpen.com.
  All Rights Reserved.
  
  Copyright (c) 1995-2001 Corporation for National Research Initiatives.
  All Rights Reserved.
  
  Copyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.
  All Rights Reserved.,
  'credits':     Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of thousands
      for supporting Python development.  See www.python.org for more information.,
  'license': Type license() to see the full license text,
  'help': Type help() for interactive help, or help(object) for help about object.,
  '__IPYTHON__': True,
  'display': <function IPython.core.display.display(*objs, include=None, exclude=None, metadata=None, transient=None, display_id=None, **kwargs)>,
  'get_ipython': <bound method InteractiveShell.get_ipython of <ipykernel.zmqshell.ZMQInteractiveShell object at 0x7f95b7c3f6d8>>},
 'signatures': <module 'numba_special2.special.signatures' from '/home/users/kkingstoun/Projekty/Python/MMPP/Bi-stable/numba_special2/special/signatures.py'>,
 'overloads': <module 'numba_special2.special.overloads' from '/home/users/kkingstoun/Projekty/Python/MMPP/Bi-stable/numba_special2/special/overloads.py'>,
 '_overloads': <module 'numba_special2.special.overloads' from '/home/users/kkingstoun/Projekty/Python/MMPP/Bi-stable/numba_special2/special/overloads.py'>}

Do you know what I'm doing wrong?

kkingstoun avatar Sep 14 '19 14:09 kkingstoun

sorry, I don't know.

luk-f-a avatar Sep 14 '19 18:09 luk-f-a

@kkingstoun please could you open an issue describing the problem you have? This issue is intended for discussion of features. Many thanks for your cooperation.

stuartarchibald avatar Sep 16 '19 10:09 stuartarchibald

@luk-f-a @gioxc88 Great to hear you would like to contribute. I think the stats support is waiting on a transparent way to ship the e.g. rv_continuous class into compiled code, hope to get to this soon based on https://github.com/numba/numba-scipy/pull/16.

stuartarchibald avatar Sep 16 '19 10:09 stuartarchibald

My primary usage of scipy at the moment is the scipy.spatial.transform rotation stuff as I mostly work with computational geometry/computer vision code. Things like converting rotation matrix <-> quaternion <-> euler angles are my main scipy piece. It's not a high priority desire though just because the formulas to convert between them are simple enough to code directly and then let numba compile it.

Kd trees I've also considered using from scipy and would be helpful for low dimensional nearest neighbors problems.

hmc-cs-mdrissi avatar Nov 08 '19 03:11 hmc-cs-mdrissi

When stats pull request will be integrated I will try to contribute. I'm working in a economic institut and we use Numba and scipy so this project is of great interest!

francoislauger avatar Nov 22 '19 23:11 francoislauger

1+ for scipy.stats. I am also using scipy.interpolate a fair bit and would love to see that implemented.

stnatter avatar Feb 04 '20 07:02 stnatter

I would really like to see support for scipy.special.erf for complex values.

mberglundmx avatar Feb 18 '20 13:02 mberglundmx

Hi, thanks a lot for this awesome project. I saw heavy performance improvements (while still being user readable) on code using scipy.stats by rewriting it to scipy.special functions.

But I agree with @Juanlu001 on the integrate functions. Currently, the bottleneck is in functions such as quad which do not accept vectors and force the use of numpy.vectorize

Also, do you have docs on how the JIT compilation can be developped? including for FORTRAN routines?

remidebette avatar Mar 01 '20 08:03 remidebette