sentencepiece
sentencepiece copied to clipboard
Segmentation fault on Ubuntu with basic python test
This problem is happening with version 0.1.96, I recently upgraded from 0.1.91 (this version was working fine). When making a basic test using Ubuntu 20.04 on GitHub, a segmentation fault occurs. Here is the basic test that is being run.
Here is the contents of what is trying to be tokenized.
The traceback is:
sentencepiece_trainer.cc(77) LOG(INFO) Starts training with :
trainer_spec {
input: /home/runner/work/gretel-synthetics/gretel-synthetics/tests/data/smol.txt
input_format:
model_prefix: m
model_type: UNIGRAM
vocab_size: 20000
self_test_sample_size: 0
character_coverage: 1
input_sentence_size: 1000000
shuffle_input_sentence: 1
seed_sentencepiece_size: 1000000
shrinking_factor: 0.75
max_sentence_length: 2048
num_threads: 16
num_sub_iterations: 2
max_sentencepiece_length: 16
split_by_unicode_script: 1
split_by_number: 1
split_by_whitespace: 1
split_digits: 0
treat_whitespace_as_suffix: 0
allow_whitespace_only_pieces: 0
user_defined_symbols: <n>
user_defined_symbols: <d>
required_chars:
byte_fallback: 0
vocabulary_output_piece_score: 1
train_extremely_large_corpus: 0
hard_vocab_limit: 0
use_all_vocab: 0
unk_id: 0
bos_id: 1
eos_id: 2
pad_id: -1
unk_piece: <unk>
bos_piece: <s>
eos_piece: </s>
pad_piece: <pad>
unk_surface: ⁇
}
normalizer_spec {
name: nmt_nfkc
add_dummy_prefix: 1
remove_extra_whitespaces: 1
escape_whitespaces: 1
normalization_rule_tsv:
}
denormalizer_spec {}
Fatal Python error: Segmentation fault
Thread 0x00007fbd121cb700 (most recent call first):
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/threading.py", line 299 in wait
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/threading.py", line 551 in wait
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/tqdm/_monitor.py", line 60 in run
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/threading.py", line 884 in _bootstrap
Current thread 0x00007fbd6528b740 (most recent call first):
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sentencepiece/__init__.py", line 389 in _TrainFromMap
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sentencepiece/__init__.py", line 444 in Train
File "/home/runner/work/gretel-synthetics/gretel-synthetics/tests/test_tokenizers.py", line 112 in test_raw_sp
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/python.py", line 183 in pytest_pyfunc_call
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/python.py", line 1641 in runtest
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/runner.py", line 162 in pytest_runtest_call
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/runner.py", line 255 in <lambda>
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/runner.py", line 311 in from_call
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/runner.py", line 255 in call_runtest_hook
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/runner.py", line 215 in call_and_report
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/runner.py", line 126 in runtestprotocol
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/runner.py", line 109 in pytest_runtest_protocol
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/main.py", line 348 in pytest_runtestloop
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/main.py", line 323 in _main
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/main.py", line 269 in wrap_session
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/main.py", line 316 in pytest_cmdline_main
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/config/__init__.py", line 163 in main
File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/config/__init__.py", line 185 in console_main
File "/opt/hostedtoolcache/Python/3.6.15/x64/bin/pytest", line 8 in <module>
/home/runner/work/_temp/72156858-d525-4a73-b08a-6c86bbbd676c.sh: line 1: 2067 Segmentation fault (core dumped) pytest -s -vv --cov src --cov-report term-missing tests/
tests/test_tokenizers.py::test_raw_sp
Error: Process completed with exit code 139.
Hi johntmyers, I want to reproduce your problem, so i build a python3.6 environment: conda create -n py36 python=3.6 conda activate py36
git clone https://github.com/gretelai/gretel-synthetics.git git reset --hard 7e73a311
pip install -r test-requirements.txt pip install -r requirements.txt
export PYTHONPATH=/root/gretel-synthetics/src:$PYTHONPATH
In this initialization environment, the sentencepiece version is 0.1.91. the test result is passed, and i upgrade sentencepiece version to v0.1.96, i can't reproduce your problem, here is my test result:
tests/test_tokenizers.py::test_raw_sp sentencepiece_trainer.cc(77) LOG(INFO) Starts training with :
trainer_spec {
input: /root/gretel-synthetics/tests/data/smol.txt
input_format:
model_prefix: m
model_type: UNIGRAM
vocab_size: 20000
self_test_sample_size: 0
character_coverage: 1
input_sentence_size: 1000000
shuffle_input_sentence: 1
seed_sentencepiece_size: 1000000
shrinking_factor: 0.75
max_sentence_length: 2048
num_threads: 16
num_sub_iterations: 2
max_sentencepiece_length: 16
split_by_unicode_script: 1
split_by_number: 1
split_by_whitespace: 1
split_digits: 0
treat_whitespace_as_suffix: 0
allow_whitespace_only_pieces: 0
user_defined_symbols:
eos_piece:
pad_piece:
trainer_interface.cc(400) LOG(INFO) Adding meta_piece:
trainer_interface.cc(400) LOG(INFO) Adding meta_piece:
Hi thanks for looking at this. What OS did you try on?
Hi thanks for looking at this. What OS did you try on?
Ubuntu 18.04.2 LTS and I don't have a ubuntu 20.04 environment.
Could you try the latest version v0.1.97?
If there is no update, this issue will be closed at the end of Aug.
Hi @taku910 I was getting a similar segfault with sentencepiece 0.1.97, but no segfault in 0.1.91.
I originally got the segfault using DNABERT, but I was also able to reproduce from @johntmyers and @xiefangqi 's example above (nb: after the clone, you have to cd into gretel-synthetics, then reset, then pip install .). I am on Ubuntu 20.04. This is using Python 3.6, because that allows you to try both sentencepiece 0.1.91 and 0.1.97. Also, it's with tensorflow 2.4.0 (not 2.4.0rc1 as originally in gretel-synthetics requirements.txt).
"Aborted (core dumped)" with 0.1.97:
(test) mep@evodeep:~/gretel-synthetics$ pip install sentencepiece==0.1.97
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: sentencepiece==0.1.97 in /home/mep/anaconda3/envs/test/lib/python3.6/site-packages (0.1.97)
(test) mep@evodeep:~/gretel-synthetics$ pytest tests/test_tokenizers.py
================================================== test session starts ===================================================
platform linux -- Python 3.6.13, pytest-7.0.1, pluggy-1.0.0
rootdir: /home/mep/gretel-synthetics
plugins: cov-4.0.0
collected 4 items
tests/test_tokenizers.py ..Fatal Python error: Aborted
Current thread 0x00007f2621dde340 (most recent call first):
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/sentencepiece/__init__.py", line 927 in _TrainFromMap
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/sentencepiece/__init__.py", line 982 in _Train
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/sentencepiece/__init__.py", line 989 in Train
File "/home/mep/gretel-synthetics/tests/test_tokenizers.py", line 112 in test_raw_sp
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/python.py", line 192 in pytest_pyfunc_call
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/python.py", line 1718 in runtest
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 168 in pytest_runtest_call
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 261 in <lambda>
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 340 in from_call
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 261 in call_runtest_hook
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 221 in call_and_report
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 132 in runtestprotocol
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/main.py", line 347 in pytest_runtestloop
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/main.py", line 322 in _main
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/main.py", line 268 in wrap_session
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/main.py", line 315 in pytest_cmdline_main
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/config/__init__.py", line 166 in main
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/config/__init__.py", line 188 in console_main
File "/home/mep/anaconda3/envs/test/bin/pytest", line 8 in <module>
Aborted (core dumped)
Next try 0.1.91 - no other commands run between these two tries. Passes.
(test) mep@evodeep:~/gretel-synthetics$ pip install sentencepiece==0.1.91
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting sentencepiece==0.1.91
Downloading sentencepiece-0.1.91-cp36-cp36m-manylinux1_x86_64.whl (1.1 MB)
|████████████████████████████████| 1.1 MB 5.0 MB/s
Installing collected packages: sentencepiece
Attempting uninstall: sentencepiece
Found existing installation: sentencepiece 0.1.97
Uninstalling sentencepiece-0.1.97:
Successfully uninstalled sentencepiece-0.1.97
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gretel-synthetics 0.16.13.dev3+g7e73a31.d20230215 requires sentencepiece==0.1.96, but you have sentencepiece 0.1.91 which is incompatible.
Successfully installed sentencepiece-0.1.91
(test) mep@evodeep:~/gretel-synthetics$ pytest tests/test_tokenizers.py
================================================== test session starts ===================================================
platform linux -- Python 3.6.13, pytest-7.0.1, pluggy-1.0.0
rootdir: /home/mep/gretel-synthetics
plugins: cov-4.0.0
collected 4 items
tests/test_tokenizers.py .... [100%]
==================================================== warnings summary ====================================================
../anaconda3/envs/test/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py:22
/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py:22: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================== 4 passed, 1 warning in 0.05s ==============================================
Next try 0.1.92 - also passes.
(test) mep@evodeep:~/gretel-synthetics$ pip install sentencepiece==0.1.92
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting sentencepiece==0.1.92
Downloading sentencepiece-0.1.92-cp36-cp36m-manylinux1_x86_64.whl (1.2 MB)
|████████████████████████████████| 1.2 MB 4.2 MB/s
WARNING: The candidate selected for download or install is a yanked version: 'sentencepiece' candidate (version 0.1.92 at https://files.pythonhosted.org/packages/68/e5/0366f50a00db181f4b7f3bdc408fc7c4177657f5bf45cb799b79fb4ce15c/sentencepiece-0.1.92-cp36-cp36m-manylinux1_x86_64.whl#sha256=7fd16c761339f593596b63e50810a2d2eff964d428ab79a49674c7371c055561 (from https://pypi.org/simple/sentencepiece/))
Reason for being yanked: Crash bug is reported (confirming)
Installing collected packages: sentencepiece
Attempting uninstall: sentencepiece
Found existing installation: sentencepiece 0.1.91
Uninstalling sentencepiece-0.1.91:
Successfully uninstalled sentencepiece-0.1.91
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gretel-synthetics 0.16.13.dev3+g7e73a31.d20230215 requires sentencepiece==0.1.96, but you have sentencepiece 0.1.92 which is incompatible.
Successfully installed sentencepiece-0.1.92
(test) mep@evodeep:~/gretel-synthetics$ pytest tests/test_tokenizers.py
================================================== test session starts ===================================================
platform linux -- Python 3.6.13, pytest-7.0.1, pluggy-1.0.0
rootdir: /home/mep/gretel-synthetics
plugins: cov-4.0.0
collected 4 items
tests/test_tokenizers.py .... [100%]
==================================================== warnings summary ====================================================
../anaconda3/envs/test/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py:22
/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py:22: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================== 4 passed, 1 warning in 0.04s ==============================================
Next try 0.1.94 (there is no 0.1.93) - "Segmentation fault (core dumped)":
(test) mep@evodeep:~/gretel-synthetics$ pip install sentencepiece==0.1.93
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
ERROR: Could not find a version that satisfies the requirement sentencepiece==0.1.93 (from versions: 0.0.0, 0.0.2, 0.0.3, 0.0.4, 0.0.5, 0.0.6, 0.0.7, 0.0.9, 0.1.0, 0.1.1, 0.1.2, 0.1.3, 0.1.4, 0.1.5, 0.1.6, 0.1.7, 0.1.8, 0.1.81, 0.1.82, 0.1.83, 0.1.85, 0.1.86, 0.1.90, 0.1.91, 0.1.92, 0.1.94, 0.1.95, 0.1.96, 0.1.97)
ERROR: No matching distribution found for sentencepiece==0.1.93
(test) mep@evodeep:~/gretel-synthetics$ pip install sentencepiece==0.1.94
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting sentencepiece==0.1.94
Downloading sentencepiece-0.1.94-cp36-cp36m-manylinux2014_x86_64.whl (1.1 MB)
|████████████████████████████████| 1.1 MB 5.4 MB/s
Installing collected packages: sentencepiece
Attempting uninstall: sentencepiece
Found existing installation: sentencepiece 0.1.92
Uninstalling sentencepiece-0.1.92:
Successfully uninstalled sentencepiece-0.1.92
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gretel-synthetics 0.16.13.dev3+g7e73a31.d20230215 requires sentencepiece==0.1.96, but you have sentencepiece 0.1.94 which is incompatible.
Successfully installed sentencepiece-0.1.94
(test) mep@evodeep:~/gretel-synthetics$ pytest tests/test_tokenizers.py
================================================== test session starts ===================================================
platform linux -- Python 3.6.13, pytest-7.0.1, pluggy-1.0.0
rootdir: /home/mep/gretel-synthetics
plugins: cov-4.0.0
collected 4 items
tests/test_tokenizers.py ..Fatal Python error: Segmentation fault
Current thread 0x00007f0814bde340 (most recent call first):
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/sentencepiece/__init__.py", line 389 in _TrainFromMap
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/sentencepiece/__init__.py", line 444 in Train
File "/home/mep/gretel-synthetics/tests/test_tokenizers.py", line 112 in test_raw_sp
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/python.py", line 192 in pytest_pyfunc_call
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/python.py", line 1718 in runtest
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 168 in pytest_runtest_call
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 261 in <lambda>
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 340 in from_call
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 261 in call_runtest_hook
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 221 in call_and_report
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 132 in runtestprotocol
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/main.py", line 347 in pytest_runtestloop
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/main.py", line 322 in _main
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/main.py", line 268 in wrap_session
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/main.py", line 315 in pytest_cmdline_main
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/config/__init__.py", line 166 in main
File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/config/__init__.py", line 188 in console_main
File "/home/mep/anaconda3/envs/test/bin/pytest", line 8 in <module>
Segmentation fault (core dumped)
Back to 0.1.91 - passes
(test) mep@evodeep:~/gretel-synthetics$ pip install sentencepiece==0.1.91
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting sentencepiece==0.1.91
Downloading sentencepiece-0.1.91-cp36-cp36m-manylinux1_x86_64.whl (1.1 MB)
|████████████████████████████████| 1.1 MB 3.1 MB/s
Installing collected packages: sentencepiece
Attempting uninstall: sentencepiece
Found existing installation: sentencepiece 0.1.94
Uninstalling sentencepiece-0.1.94:
Successfully uninstalled sentencepiece-0.1.94
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gretel-synthetics 0.16.13.dev3+g7e73a31.d20230215 requires sentencepiece==0.1.96, but you have sentencepiece 0.1.91 which is incompatible.
Successfully installed sentencepiece-0.1.91
(test) mep@evodeep:~/gretel-synthetics$ pytest tests/test_tokenizers.py
================================================== test session starts ===================================================
platform linux -- Python 3.6.13, pytest-7.0.1, pluggy-1.0.0
rootdir: /home/mep/gretel-synthetics
plugins: cov-4.0.0
collected 4 items
tests/test_tokenizers.py .... [100%]
==================================================== warnings summary ====================================================
../anaconda3/envs/test/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py:22
/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py:22: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================== 4 passed, 1 warning in 0.04s ==============================================
Finally, here is my pip freeze:
(test) mep@evodeep:~/gretel-synthetics$ pip freeze
absl-py==0.15.0
astroid==2.11.7
astunparse==1.6.3
attrs==22.2.0
boto3==1.23.10
botocore==1.26.10
cached-property==1.5.2
cachetools==4.2.4
certifi==2021.5.30
charset-normalizer==2.0.12
clang==5.0
cloudpickle==2.2.1
coverage==6.2
dataclasses==0.7
dill==0.3.4
dm-tree==0.1.8
flake8==5.0.4
flatbuffers==1.12
gast==0.3.3
google-auth==1.35.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
-e git+https://github.com/gretelai/gretel-synthetics.git@7e73a3119e1f3cfb926918b81761c013e0d0d0d1#egg=gretel_synthetics
grpcio==1.32.0
h5py==2.10.0
idna==3.4
importlib-metadata==4.8.3
importlib-resources==5.4.0
iniconfig==1.1.1
isort==5.10.1
jmespath==0.10.0
keras==2.6.0
Keras-Preprocessing==1.1.2
lazy-object-proxy==1.7.1
loky==2.8.0
Markdown==3.3.7
mccabe==0.7.0
mpmath==1.2.1
numpy==1.19.5
oauthlib==3.2.2
opt-einsum==3.3.0
packaging==21.3
pandas==1.1.5
platformdirs==2.4.0
pluggy==1.0.0
protobuf==3.19.6
py==1.11.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycodestyle==2.9.1
pyflakes==2.5.0
pylint==2.13.9
pyparsing==3.0.9
pytest==7.0.1
pytest-cov==4.0.0
python-dateutil==2.8.2
pytz==2022.7.1
requests==2.27.1
requests-oauthlib==1.3.1
rsa==4.9
s3transfer==0.5.2
scipy==1.5.4
sentencepiece==0.1.91
six==1.15.0
smart-open==2.2.1
tensorboard==2.6.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.4.0
tensorflow-estimator==2.4.0
tensorflow-privacy==0.5.1
termcolor==1.1.0
tomli==1.2.3
tqdm==4.64.1
typed-ast==1.5.4
typing-extensions==3.7.4.3
urllib3==1.26.14
Werkzeug==2.0.3
wrapt==1.12.1
zipp==3.6.0
So something broke between 0.1.92 and 0.1.94.
Hope that is useful, thanks for making sentencepiece!