numba-scipy
numba-scipy copied to clipboard
numba-scipy is now accepting PRs, discuss what to focus on first!
As title... place holder ticket for discussion over what to focus on first. scipy.special
and scipy.sparse
were suggested here: https://github.com/numba/numba/issues/4292#issuecomment-510257368
I don't think I can help much with scipy.special
(C integration is not my strong suit) but once scipy.special
is available I will look into bringing in the statistical distributions
I would love to see scipy.integrate
:)
I can do scipy.special
. @Juanlu001 what parts of integrate are you most interested in seeing? Something like quad
would be a royal pain, but if we’re talking e.g. Gaussian quadrature that’s much more approachable.
(When it comes to adaptive quadrature it would probably be good to do some work on the SciPy side first vs. reproducing quadpack here.)
@person142 I agree, unless there's going to be some incredible performance (or other) benefit gained by translating core numerical libraries to Numba then creating a clean API on the callee side to make it easy for Numba make make the call is preferable.
@person142 I was thinking more of accelerating the new Initial Value Problem solvers introduced in SciPy 1.0, which are much more feature-rich than odeint
and ode
but have a performance overhead because they're written in pure Python. I acknowledge though that it's not a trivial problem, but the question was open so... :innocent:
@person142 if you are happy to do scipy.special
that'd be great, thanks. There's a tiny module in numba-scipy
already for this, I was using it for CI and to get something uploaded to PyPI, feel free to wipe it with actual contents!
I was thinking more of accelerating the new Initial Value Problem solvers introduced in SciPy 1.0, which are much more feature-rich than odeint and ode but have a performance overhead because they're written in pure Python
Hm that is a quite interesting case. Could be that we should work on moving some stuff into Cython on the SciPy side to reduce overhead and then make sure it integrates well with Numba jitted functions...
@luk-f-a "Basic" support for scipy.special
is added in the now merged https://github.com/numba/numba-scipy/pull/13 if you are interested in writing statistical distribution implementations? Thanks.
thanks for letting me know @stuartarchibald . I started looking at it and I have a general question: should we aim to preserve the structure of the Scipy code?
For example, for distributions I would need one auxiliary function from scipy/_lib/_util.py
. Should I create a numba-scipy/_lib/_util.py
file and put the jit-version in there?
@luk-f-a good point, perhaps we try with this approach and see how it goes ? There may be issues tracking upstream refactoring/places where it's just not reasonable to do this, however, it's probably worth a try as a predefined order might help.
Hello I am following the discussion with great interest and would like to ask how does it work exactly.
- Do I need to import the package like
import numba-scipy
? - Does it work with scipy.stats? for example
scipyp.stats.chi2.rvs() scipy.stats.t.cdf() sp.stats.t.ppf()
- If not do you suggest any other method to speed up scipy functions (especially inversion of student t is very slow)
- Where can I find information of how to help for integration between numba and scipy? I would like to contribute
many thanks
hi @gioxc88 , it's great to hear that you're interested and even considering contributing. I started some work on stats
, building norm.rvs
as an example to test out the approach. But this has not been merged yet, so we're in a very early stage!!
to your questions:
- the idea is that you won't need to, Numba will recognize the calls to Scipy and call
numba-scipy
in the same way that it does with Numpy. - it will eventually, but which distribution works will depend on the community interest and contributions. There are 120 distributions in total, it will take some time. We're looking into how adding new distributions could be simplified.
- Is your problem the time that
t.ppf
takes for each calculation, or that you might be calling it many times? - Stay tuned on this issue for now, I assume that over time there'll be contributing guidelines. In particular, for
stats
I'd like to have a nice tutorial on adding new distributions.
ping @stuartarchibald
hello @luk-f-a thank you for your answer. I relied on the community for too long and now I think it's time for me to contribute.
I will certainly follow this issue.
About the t.ppf
I need to call it in a loop 5 million times, for an a input array that has shape [65, 13]
This is because if I call it for a 3d array of shape [65, 13, 5000000]
if won't fit my RAM.
But I tried with a 64Gb RAM pc to call it one single using the big 3d array and it's still extremely slow compared do t.cdf
. And when I say extremely slow I am talking about t.cdf
being executed in like 1 minute while the t.ppf
takes more than 30 mins.
I relied on the community for too long and now I think it's time for me to contribute.
same here!! :smile:
I'm not surprised ppf
is slower than cdf
. I'm not sure about t.ppf
in particular but it probably relies on numerical root finding of cdf
. That means cdf
might be run many times for each ppf
call. The Scipy code seems to be in Cython (maybe we need a Scipy dev to confirm) so it's possible Numba might not improve performance. That array is pretty big (4bn numbers) anyway. For comparison, I ran the equivalent of t.ppf
in Julia, and it took 1sec per million. That would mean 70 minutes for your big array. You might need a special algorithm (maybe gaussian quadrature?) if you really need to speed it up. Otherwise, create the numbers once, and read them from disk every time you need them.
Hi, what about the Bessel functions, can I use it with numba?
hi @kkingstoun , I did a quick test with jv
and it worked.
from scipy import special as sp
import numba
@numba.njit
def test(v, z):
return sp.jv(v, z)
print(sp.jv(3, 3), test(3., 3.))
There's a catch: it seems that you have to always pass the arguments as floats.
Hi @luk-f-a, thank you for your response, but on my computer it doesn't work:
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Unknown attribute 'jv' of type Module(<module 'scipy.special' from '/home/kking/anaconda3/lib/python3.7/site-packages/scipy/special/__init__.py'>)
File "<ipython-input-332-f5add8b419a3>", line 6:
def test(v, z):
return sp.jv(v, z)
^
[1] During: typing of get attribute at <ipython-input-332-f5add8b419a3> (6)
File "<ipython-input-332-f5add8b419a3>", line 6:
def test(v, z):
return sp.jv(v, z)
^
Numba version: 0.45.1 Scipy: 1.3.0
Edit, the same on windows:
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Unknown attribute 'jv' of type Module(<module 'scipy.special' from 'C:\\Users\\Mateusz\\Anaconda3.7.4\\lib\\site-packages\\scipy\\special\\__init__.py'>)
File "<ipython-input-6-423efa62ec3d>", line 6:
def test(v, z):
return sp.jv(v, z)
^
[1] During: typing of get attribute at <ipython-input-6-423efa62ec3d> (6)
File "<ipython-input-6-423efa62ec3d>", line 6:
def test(v, z):
return sp.jv(v, z)
what happens if you run from numba_scipy import special as nsp_sp
?
@luk-f-a
This line works, but:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-15-ce32addfc6b4> in <module>
1 from numba_scipy import special as nsp_sp
----> 2 nsp_sp.jv(10)
AttributeError: module 'numba_scipy.special' has no attribute 'jv'
do you have the latest numba-scipy master? Is special
a .py file or a folder?
@luk-f-a Yes, I see now, that there is a problem with installation. I dont'see the special directory, even after reinstallation using pip straight from the git.
How should I install correctly?
To test it I just copied whole folder to the my project folder, and
import numba_special2.special as nps2
nps2.__dict__
{'__name__': 'numba_special2.special',
'__doc__': None,
'__package__': 'numba_special2.special',
'__loader__': <_frozen_importlib_external.SourceFileLoader at 0x7f95b430fe48>,
'__spec__': ModuleSpec(name='numba_special2.special', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7f95b430fe48>, origin='/home/users/kkingstoun/Projekty/Python/MMPP/Bi-stable/numba_special2/special/__init__.py', submodule_search_locations=['/home/users/kkingstoun/Projekty/Python/MMPP/Bi-stable/numba_special2/special']),
'__path__': ['/home/users/kkingstoun/Projekty/Python/MMPP/Bi-stable/numba_special2/special'],
'__file__': '/home/users/kkingstoun/Projekty/Python/MMPP/Bi-stable/numba_special2/special/__init__.py',
'__cached__': '/home/users/kkingstoun/Projekty/Python/MMPP/Bi-stable/numba_special2/special/__pycache__/__init__.cpython-37.pyc',
'__builtins__': {'__name__': 'builtins',
'__doc__': "Built-in functions, exceptions, and other objects.\n\nNoteworthy: None is the `nil' object; Ellipsis represents `...' in slices.",
'__package__': '',
'__loader__': _frozen_importlib.BuiltinImporter,
'__spec__': ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>),
'__build_class__': <function __build_class__>,
'__import__': <function __import__>,
'abs': <function abs(x, /)>,
'all': <function all(iterable, /)>,
'any': <function any(iterable, /)>,
'ascii': <function ascii(obj, /)>,
'bin': <function bin(number, /)>,
'breakpoint': <function breakpoint>,
'callable': <function callable(obj, /)>,
'chr': <function chr(i, /)>,
'compile': <function compile(source, filename, mode, flags=0, dont_inherit=False, optimize=-1)>,
'delattr': <function delattr(obj, name, /)>,
'dir': <function dir>,
'divmod': <function divmod(x, y, /)>,
'eval': <function eval(source, globals=None, locals=None, /)>,
'exec': <function exec(source, globals=None, locals=None, /)>,
'format': <function format(value, format_spec='', /)>,
'getattr': <function getattr>,
'globals': <function globals()>,
'hasattr': <function hasattr(obj, name, /)>,
'hash': <function hash(obj, /)>,
'hex': <function hex(number, /)>,
'id': <function id(obj, /)>,
'input': <bound method Kernel.raw_input of <ipykernel.ipkernel.IPythonKernel object at 0x7f95b7c3aa90>>,
'isinstance': <function isinstance(obj, class_or_tuple, /)>,
'issubclass': <function issubclass(cls, class_or_tuple, /)>,
'iter': <function iter>,
'len': <function len(obj, /)>,
'locals': <function locals()>,
'max': <function max>,
'min': <function min>,
'next': <function next>,
'oct': <function oct(number, /)>,
'ord': <function ord(c, /)>,
'pow': <function pow(x, y, z=None, /)>,
'print': <function print>,
'repr': <function repr(obj, /)>,
'round': <function round(number, ndigits=None)>,
'setattr': <function setattr(obj, name, value, /)>,
'sorted': <function sorted(iterable, /, *, key=None, reverse=False)>,
'sum': <function sum(iterable, start=0, /)>,
'vars': <function vars>,
'None': None,
'Ellipsis': Ellipsis,
'NotImplemented': NotImplemented,
'False': False,
'True': True,
'bool': bool,
'memoryview': memoryview,
'bytearray': bytearray,
'bytes': bytes,
'classmethod': classmethod,
'complex': complex,
'dict': dict,
'enumerate': enumerate,
'filter': filter,
'float': float,
'frozenset': frozenset,
'property': property,
'int': int,
'list': list,
'map': map,
'object': object,
'range': range,
'reversed': reversed,
'set': set,
'slice': slice,
'staticmethod': staticmethod,
'str': str,
'super': super,
'tuple': tuple,
'type': type,
'zip': zip,
'__debug__': True,
'BaseException': BaseException,
'Exception': Exception,
'TypeError': TypeError,
'StopAsyncIteration': StopAsyncIteration,
'StopIteration': StopIteration,
'GeneratorExit': GeneratorExit,
'SystemExit': SystemExit,
'KeyboardInterrupt': KeyboardInterrupt,
'ImportError': ImportError,
'ModuleNotFoundError': ModuleNotFoundError,
'OSError': OSError,
'EnvironmentError': OSError,
'IOError': OSError,
'EOFError': EOFError,
'RuntimeError': RuntimeError,
'RecursionError': RecursionError,
'NotImplementedError': NotImplementedError,
'NameError': NameError,
'UnboundLocalError': UnboundLocalError,
'AttributeError': AttributeError,
'SyntaxError': SyntaxError,
'IndentationError': IndentationError,
'TabError': TabError,
'LookupError': LookupError,
'IndexError': IndexError,
'KeyError': KeyError,
'ValueError': ValueError,
'UnicodeError': UnicodeError,
'UnicodeEncodeError': UnicodeEncodeError,
'UnicodeDecodeError': UnicodeDecodeError,
'UnicodeTranslateError': UnicodeTranslateError,
'AssertionError': AssertionError,
'ArithmeticError': ArithmeticError,
'FloatingPointError': FloatingPointError,
'OverflowError': OverflowError,
'ZeroDivisionError': ZeroDivisionError,
'SystemError': SystemError,
'ReferenceError': ReferenceError,
'MemoryError': MemoryError,
'BufferError': BufferError,
'Warning': Warning,
'UserWarning': UserWarning,
'DeprecationWarning': DeprecationWarning,
'PendingDeprecationWarning': PendingDeprecationWarning,
'SyntaxWarning': SyntaxWarning,
'RuntimeWarning': RuntimeWarning,
'FutureWarning': FutureWarning,
'ImportWarning': ImportWarning,
'UnicodeWarning': UnicodeWarning,
'BytesWarning': BytesWarning,
'ResourceWarning': ResourceWarning,
'ConnectionError': ConnectionError,
'BlockingIOError': BlockingIOError,
'BrokenPipeError': BrokenPipeError,
'ChildProcessError': ChildProcessError,
'ConnectionAbortedError': ConnectionAbortedError,
'ConnectionRefusedError': ConnectionRefusedError,
'ConnectionResetError': ConnectionResetError,
'FileExistsError': FileExistsError,
'FileNotFoundError': FileNotFoundError,
'IsADirectoryError': IsADirectoryError,
'NotADirectoryError': NotADirectoryError,
'InterruptedError': InterruptedError,
'PermissionError': PermissionError,
'ProcessLookupError': ProcessLookupError,
'TimeoutError': TimeoutError,
'open': <function io.open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)>,
'copyright': Copyright (c) 2001-2019 Python Software Foundation.
All Rights Reserved.
Copyright (c) 2000 BeOpen.com.
All Rights Reserved.
Copyright (c) 1995-2001 Corporation for National Research Initiatives.
All Rights Reserved.
Copyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.
All Rights Reserved.,
'credits': Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of thousands
for supporting Python development. See www.python.org for more information.,
'license': Type license() to see the full license text,
'help': Type help() for interactive help, or help(object) for help about object.,
'__IPYTHON__': True,
'display': <function IPython.core.display.display(*objs, include=None, exclude=None, metadata=None, transient=None, display_id=None, **kwargs)>,
'get_ipython': <bound method InteractiveShell.get_ipython of <ipykernel.zmqshell.ZMQInteractiveShell object at 0x7f95b7c3f6d8>>},
'signatures': <module 'numba_special2.special.signatures' from '/home/users/kkingstoun/Projekty/Python/MMPP/Bi-stable/numba_special2/special/signatures.py'>,
'overloads': <module 'numba_special2.special.overloads' from '/home/users/kkingstoun/Projekty/Python/MMPP/Bi-stable/numba_special2/special/overloads.py'>,
'_overloads': <module 'numba_special2.special.overloads' from '/home/users/kkingstoun/Projekty/Python/MMPP/Bi-stable/numba_special2/special/overloads.py'>}
Do you know what I'm doing wrong?
sorry, I don't know.
@kkingstoun please could you open an issue describing the problem you have? This issue is intended for discussion of features. Many thanks for your cooperation.
@luk-f-a @gioxc88 Great to hear you would like to contribute. I think the stats
support is waiting on a transparent way to ship the e.g. rv_continuous
class into compiled code, hope to get to this soon based on https://github.com/numba/numba-scipy/pull/16.
My primary usage of scipy at the moment is the scipy.spatial.transform rotation stuff as I mostly work with computational geometry/computer vision code. Things like converting rotation matrix <-> quaternion <-> euler angles are my main scipy piece. It's not a high priority desire though just because the formulas to convert between them are simple enough to code directly and then let numba compile it.
Kd trees I've also considered using from scipy and would be helpful for low dimensional nearest neighbors problems.
When stats
pull request will be integrated I will try to contribute. I'm working in a economic institut and we use Numba and scipy so this project is of great interest!
1+ for scipy.stats
. I am also using scipy.interpolate
a fair bit and would love to see that implemented.
I would really like to see support for scipy.special.erf for complex values.
Hi, thanks a lot for this awesome project. I saw heavy performance improvements (while still being user readable) on code using scipy.stats by rewriting it to scipy.special functions.
But I agree with @Juanlu001 on the integrate functions. Currently, the bottleneck is in functions such as quad which do not accept vectors and force the use of numpy.vectorize
Also, do you have docs on how the JIT compilation can be developped? including for FORTRAN routines?