sling icon indicating copy to clipboard operation
sling copied to clipboard

segmentation fault when trying to fetch dataset

Open mika-data opened this issue 2 years ago • 17 comments

root@server:~/Downloads# python3 -c "import sling; sling.which()"
SLING API version 3.0.0 (Sat Feb  4 11:59:11 2023) in /usr/local/lib/python3.9/dist-packages/sling


root@server:~/Downloads# sling fetch --dataset caspar
[2023-02-04 11:59:34.807193: I sling/pyapi/pytask.cc:525] Start HTTP server on port 6767
[2023-02-04 11:59:34.813720: I run.py:341] Execute command fetch
*** Signal 11 (Segmentation fault) at 0x0000020b26f0 for 0x0000020b26f0
  @ 0x0000020b26f0 (unknown)

**Segmentation fault**

root@server:~/Downloads# cat /etc/*release
PRETTY_NAME="**Debian GNU/Linux 11** (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian

mika-data avatar Feb 04 '23 11:02 mika-data

My fault, I have just downloaded the Python API via pip, yet.

The command line interpreter will probably work only for a small subset of commands.

mika-data avatar Feb 04 '23 11:02 mika-data

@mika-data: Did you try to build the Python API yourself on your Debian machine? I normally build on Ubuntu, but I would think the differences are minor.

ringgaard avatar Feb 04 '23 12:02 ringgaard

No, I have downloaded the Python API as a whl as recommended in your installation documentation.

I had previously only python3.9 on my machine, after I had build python3.6 and then build sling from source, everything seems to work for me.

root@server:/usr# python3 -c "import sling; sling.which()"
SLING API version 3.0.0 (Sat Feb  4 13:50:30 2023) in /usr/local/python-3.6.15/lib/python3.6/site-packages/sling
root@cgnvision:/usr# sling fetch --dataset caspar
[2023-02-04 13:51:48.997469: I sling/pyapi/pytask.cc:525] Start HTTP server on port 6767
[2023-02-04 13:51:49.007196: I run.py:341] Execute command fetch
[2023-02-04 13:51:49.008710: I sling/task/job.cc:349] All systems GO
[2023-02-04 13:51:49.008867: I sling/task/job.cc:62] Starting stage #0
[2023-02-04 13:51:49.008945: I sling/task/job.cc:66] Start url-download
[2023-02-04 13:51:49.009773: I download.py:51] Download caspar from https://ringgaard.com/data/caspar/caspar.flow
[2023-02-04 13:51:49.009979: I download.py:78] Start download of ./data/e/caspar/caspar.flow
[2023-02-04 13:51:49.741937: I download.py:94] caspar downloaded
[2023-02-04 13:51:49.742027: I sling/task/job.cc:402] Task url-download completed
[2023-02-04 13:51:49.742188: I sling/task/job.cc:407] Task url-download done
[2023-02-04 13:51:49.742255: I sling/task/job.cc:419] Stage #0 done
[2023-02-04 13:51:49.743633: I workflow.py:821] sending final status to monitor
[2023-02-04 13:51:49.743902: I run.py:351] Done

mika-data avatar Feb 04 '23 12:02 mika-data

Hmm, maybe I should test this on Python 3.9. My Ubuntu only has 3.8.

ringgaard avatar Feb 04 '23 12:02 ringgaard

I have tested it on another debian machine. There it worked fine:

(wikidata) mika@server:~/Programming/wikidata$ pip3 install https://ringgaard.com/data/dist/sling-3.0.0-py3-none-linux_x86_64.whl
Collecting sling==3.0.0
  Downloading https://ringgaard.com/data/dist/sling-3.0.0-py3-none-linux_x86_64.whl (7.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.4/7.4 MB 3.9 MB/s eta 0:00:00
Installing collected packages: sling
Successfully installed sling-3.0.0
(wikidata) mika@server:~/Programming/wikidata$ python3 -c "import sling; sling.which()"
SLING API version 3.0.0 (Sat Feb  4 20:58:48 2023) in /home/mika/anaconda3/envs/wikidata/lib/python3.8/site-packages/sling
(wikidata) mika@server:~/Programming/wikidata$ sling fetch --dataset caspar
[2023-02-04 20:59:03.350186: I sling/pyapi/pytask.cc:525] Start HTTP server on port 6767
[2023-02-04 20:59:03.354802: I run.py:341] Execute command fetch
[2023-02-04 20:59:03.355687: I sling/task/job.cc:349] All systems GO
[2023-02-04 20:59:03.355815: I sling/task/job.cc:62] Starting stage #0
[2023-02-04 20:59:03.355821: I sling/task/job.cc:66] Start url-download
[2023-02-04 20:59:03.356144: I download.py:51] Download caspar from https://ringgaard.com/data/caspar/caspar.flow
[2023-02-04 20:59:03.356218: I download.py:78] Start download of ./data/e/caspar/caspar.flow
[2023-02-04 20:59:05.813746: I download.py:94] caspar downloaded
[2023-02-04 20:59:05.813851: I sling/task/job.cc:402] Task url-download completed
[2023-02-04 20:59:05.814257: I sling/task/job.cc:407] Task url-download done
[2023-02-04 20:59:05.814305: I sling/task/job.cc:419] Stage #0 done
[2023-02-04 20:59:05.816389: I workflow.py:821] sending final status to monitor
[2023-02-04 20:59:05.816896: I run.py:351] Done
(wikidata) mika@blackbrain:~/Programming/wikidata$ cat /etc/*release
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
(wikidata) mika@server:~/Programming/wikidata$ python -V
Python 3.8.16

mika-data avatar Feb 04 '23 20:02 mika-data

So it seems to work on Python 3.8, but fail on Python 3.9, right?

ringgaard avatar Feb 04 '23 20:02 ringgaard

Yes, I can confirm the bug on a second machine:

mika@server:Downloads$ pip3 install https://ringgaard.com/data/dist/sling-3.0.0-py3-none-linux_x86_64.whl
Collecting sling==3.0.0
  Using cached https://ringgaard.com/data/dist/sling-3.0.0-py3-none-linux_x86_64.whl (7.4 MB)
Installing collected packages: sling
Successfully installed sling-3.0.0
mika@server:Downloads$ python3 -c "import sling; sling.which()"
SLING API version 3.0.0 (Sat Feb  4 21:48:31 2023) in /home/mika/.local/lib/python3.9/site-packages/sling
mika@server:Downloads$ sling fetch --dataset caspar
[2023-02-04 21:48:43.254199: I sling/pyapi/pytask.cc:525] Start HTTP server on port 6767
[2023-02-04 21:48:43.256684: I run.py:341] Execute command fetch
*** Signal 11 (Segmentation fault) at 0x0000019d0660 for 0x0000019d0660
  @ 0x0000019d0660 (unknown)
**Speicherzugriffsfehler** <---- segmentation fault
mika@server:Downloads$ py -V
Python 3.9.2

mika-data avatar Feb 04 '23 20:02 mika-data

Let me try to see if I can reproduce this on one of my own machines.

ringgaard avatar Feb 04 '23 20:02 ringgaard

I can now reproduce the crash. It seems to have something to do with Python type registration in the pysling C extension when running in Python 3.9.

ringgaard avatar Feb 07 '23 10:02 ringgaard

I seems like you need to build pysling.so using python3.9-dev for it to work with Python 3.9, so I have added support for building pysling.so for Python 3.9. You change DPYVER=36 to DPYVER=39 and rebuild using tools/buildall.sh. It seems like the 3.9 version can be used with earlier versions of Python, but I haven't change the default yet because I don't have Python 3.9 on all my machines that build the code.

ringgaard avatar Feb 07 '23 17:02 ringgaard

When I compile from source against Python 3.10 and don't dockerize (pip/venv/...) you might get:

Compiling sling/pyapi/pyapi.cc failed: (Exit 1): gcc failed: error executing command (from target //sling/pyapi:pyapi) /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 25 arguments skipped)
In file included from ./sling/pyapi/pyarray.h:19,
                 from sling/pyapi/pyapi.cc:17:
./sling/pyapi/pybase.h: In static member function 'static sling::Text sling::PyBase::GetText(PyObject*)':
./sling/pyapi/pybase.h:130:37: error: invalid conversion from 'const char*' to 'char*' [-fpermissive]
  130 |       data = PyUnicode_AsUTF8AndSize(obj, &length);
      |              ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
      |                                     |
      |                                     const char*

Sadly I don't know much C++, but isn't this just a point of permitting the type conversion?

meerfrau avatar Apr 20 '23 16:04 meerfrau

Are you using the newest version of the code? Line 130 of pybase.h does not match your error message.

Are there any reason that you cannot use the pre-built wheel?

ringgaard avatar Apr 20 '23 16:04 ringgaard

I've changed pybase to:

#include <python3.10/Python.h>
#include <python3.10/structmember.h>

Are there any reason that you cannot use the pre-built wheel?

To see the error ;)

meerfrau avatar Apr 20 '23 16:04 meerfrau

I'm sorry, the current sources work perfectly against Python 3.10!

PS: Installed via sudo ln -s ./sling/python /usr/lib/python3.10/site-packages/sling → may you please add a setup.py for people like me?

meerfrau avatar Apr 20 '23 17:04 meerfrau

@meerfrau: I use wheels instead of setuptools, so you can install SLING with the following command:

sudo pip3 install https://ringgaard.com/data/dist/sling-3.0.0-py3-none-linux_x86_64.whl

I have updated the code to support Python 3.10 by changing DPYVER=36 to DPYVER=310.

ringgaard avatar Apr 20 '23 17:04 ringgaard