sentencepiece icon indicating copy to clipboard operation
sentencepiece copied to clipboard

Segmentation fault on Ubuntu with basic python test

Open johntmyers opened this issue 4 years ago • 4 comments

This problem is happening with version 0.1.96, I recently upgraded from 0.1.91 (this version was working fine). When making a basic test using Ubuntu 20.04 on GitHub, a segmentation fault occurs. Here is the basic test that is being run.

Here is the contents of what is trying to be tokenized.

The traceback is:

sentencepiece_trainer.cc(77) LOG(INFO) Starts training with : 
trainer_spec {
  input: /home/runner/work/gretel-synthetics/gretel-synthetics/tests/data/smol.txt
  input_format: 
  model_prefix: m
  model_type: UNIGRAM
  vocab_size: 20000
  self_test_sample_size: 0
  character_coverage: 1
  input_sentence_size: 1000000
  shuffle_input_sentence: 1
  seed_sentencepiece_size: 1000000
  shrinking_factor: 0.75
  max_sentence_length: 2048
  num_threads: 16
  num_sub_iterations: 2
  max_sentencepiece_length: 16
  split_by_unicode_script: 1
  split_by_number: 1
  split_by_whitespace: 1
  split_digits: 0
  treat_whitespace_as_suffix: 0
  allow_whitespace_only_pieces: 0
  user_defined_symbols: <n>
  user_defined_symbols: <d>
  required_chars: 
  byte_fallback: 0
  vocabulary_output_piece_score: 1
  train_extremely_large_corpus: 0
  hard_vocab_limit: 0
  use_all_vocab: 0
  unk_id: 0
  bos_id: 1
  eos_id: 2
  pad_id: -1
  unk_piece: <unk>
  bos_piece: <s>
  eos_piece: </s>
  pad_piece: <pad>
  unk_surface:  ⁇ 
}
normalizer_spec {
  name: nmt_nfkc
  add_dummy_prefix: 1
  remove_extra_whitespaces: 1
  escape_whitespaces: 1
  normalization_rule_tsv: 
}
denormalizer_spec {}
Fatal Python error: Segmentation fault

Thread 0x00007fbd121cb700 (most recent call first):
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/threading.py", line 299 in wait
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/threading.py", line 551 in wait
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/tqdm/_monitor.py", line 60 in run
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/threading.py", line 916 in _bootstrap_inner
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/threading.py", line 884 in _bootstrap

Current thread 0x00007fbd6528b740 (most recent call first):
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sentencepiece/__init__.py", line 389 in _TrainFromMap
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sentencepiece/__init__.py", line 444 in Train
  File "/home/runner/work/gretel-synthetics/gretel-synthetics/tests/test_tokenizers.py", line 112 in test_raw_sp
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/python.py", line 183 in pytest_pyfunc_call
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/python.py", line 1641 in runtest
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/runner.py", line 162 in pytest_runtest_call
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/runner.py", line 255 in <lambda>
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/runner.py", line 311 in from_call
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/runner.py", line 255 in call_runtest_hook
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/runner.py", line 215 in call_and_report
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/runner.py", line 126 in runtestprotocol
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/runner.py", line 109 in pytest_runtest_protocol
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/main.py", line 348 in pytest_runtestloop
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/main.py", line 323 in _main
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/main.py", line 269 in wrap_session
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/main.py", line 316 in pytest_cmdline_main
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/config/__init__.py", line 163 in main
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/_pytest/config/__init__.py", line 185 in console_main
  File "/opt/hostedtoolcache/Python/3.6.15/x64/bin/pytest", line 8 in <module>
/home/runner/work/_temp/72156858-d525-4a73-b08a-6c86bbbd676c.sh: line 1:  2067 Segmentation fault      (core dumped) pytest -s -vv --cov src --cov-report term-missing tests/
tests/test_tokenizers.py::test_raw_sp 
Error: Process completed with exit code 139.

johntmyers avatar Oct 25 '21 19:10 johntmyers

Hi johntmyers, I want to reproduce your problem, so i build a python3.6 environment: conda create -n py36 python=3.6 conda activate py36

git clone https://github.com/gretelai/gretel-synthetics.git git reset --hard 7e73a311

pip install -r test-requirements.txt pip install -r requirements.txt

export PYTHONPATH=/root/gretel-synthetics/src:$PYTHONPATH

In this initialization environment, the sentencepiece version is 0.1.91. the test result is passed, and i upgrade sentencepiece version to v0.1.96, i can't reproduce your problem, here is my test result:

tests/test_tokenizers.py::test_raw_sp sentencepiece_trainer.cc(77) LOG(INFO) Starts training with : trainer_spec { input: /root/gretel-synthetics/tests/data/smol.txt input_format: model_prefix: m model_type: UNIGRAM vocab_size: 20000 self_test_sample_size: 0 character_coverage: 1 input_sentence_size: 1000000 shuffle_input_sentence: 1 seed_sentencepiece_size: 1000000 shrinking_factor: 0.75 max_sentence_length: 2048 num_threads: 16 num_sub_iterations: 2 max_sentencepiece_length: 16 split_by_unicode_script: 1 split_by_number: 1 split_by_whitespace: 1 split_digits: 0 treat_whitespace_as_suffix: 0 allow_whitespace_only_pieces: 0 user_defined_symbols: user_defined_symbols: required_chars: byte_fallback: 0 vocabulary_output_piece_score: 1 train_extremely_large_corpus: 0 hard_vocab_limit: 0 use_all_vocab: 0 unk_id: 0 bos_id: 1 eos_id: 2 pad_id: -1 unk_piece: bos_piece: eos_piece: pad_piece: unk_surface: ⁇ } normalizer_spec { name: nmt_nfkc add_dummy_prefix: 1 remove_extra_whitespaces: 1 escape_whitespaces: 1 normalization_rule_tsv: } denormalizer_spec {} trainer_interface.cc(329) LOG(INFO) SentenceIterator is not specified. Using MultiFileSentenceIterator. trainer_interface.cc(178) LOG(INFO) Loading corpus: /root/gretel-synthetics/tests/data/smol.txt trainer_interface.cc(385) LOG(INFO) Loaded all 6 sentences trainer_interface.cc(400) LOG(INFO) Adding meta_piece: trainer_interface.cc(400) LOG(INFO) Adding meta_piece: trainer_interface.cc(400) LOG(INFO) Adding meta_piece: trainer_interface.cc(400) LOG(INFO) Adding meta_piece: trainer_interface.cc(400) LOG(INFO) Adding meta_piece: trainer_interface.cc(405) LOG(INFO) Normalizing sentences... trainer_interface.cc(466) LOG(INFO) all chars count=326 trainer_interface.cc(487) LOG(INFO) Alphabet size=31 trainer_interface.cc(488) LOG(INFO) Final character coverage=1 trainer_interface.cc(520) LOG(INFO) Done! preprocessed 6 sentences. unigram_model_trainer.cc(139) LOG(INFO) Making suffix array... unigram_model_trainer.cc(143) LOG(INFO) Extracting frequent sub strings... unigram_model_trainer.cc(194) LOG(INFO) Initialized 87 seed sentencepieces trainer_interface.cc(526) LOG(INFO) Tokenizing input sentences with whitespace: 6 trainer_interface.cc(537) LOG(INFO) Done! 46 unigram_model_trainer.cc(489) LOG(INFO) Using 46 sentences for EM training unigram_model_trainer.cc(505) LOG(INFO) EM sub_iter=0 size=66 obj=14.4718 num_tokens=163 num_tokens/piece=2.4697 unigram_model_trainer.cc(505) LOG(INFO) EM sub_iter=1 size=65 obj=13.4768 num_tokens=163 num_tokens/piece=2.50769 trainer_interface.cc(615) LOG(INFO) Saving model: m.model trainer_interface.cc(626) LOG(INFO) Saving vocabs: m.vocab PASSED

xiefangqi avatar Dec 06 '21 02:12 xiefangqi

Hi thanks for looking at this. What OS did you try on?

johntmyers avatar Dec 06 '21 03:12 johntmyers

Hi thanks for looking at this. What OS did you try on?

Ubuntu 18.04.2 LTS and I don't have a ubuntu 20.04 environment.

xiefangqi avatar Dec 06 '21 04:12 xiefangqi

Could you try the latest version v0.1.97?

taku910 avatar Aug 09 '22 01:08 taku910

If there is no update, this issue will be closed at the end of Aug.

taku910 avatar Aug 15 '22 02:08 taku910

Hi @taku910 I was getting a similar segfault with sentencepiece 0.1.97, but no segfault in 0.1.91.

I originally got the segfault using DNABERT, but I was also able to reproduce from @johntmyers and @xiefangqi 's example above (nb: after the clone, you have to cd into gretel-synthetics, then reset, then pip install .). I am on Ubuntu 20.04. This is using Python 3.6, because that allows you to try both sentencepiece 0.1.91 and 0.1.97. Also, it's with tensorflow 2.4.0 (not 2.4.0rc1 as originally in gretel-synthetics requirements.txt).

"Aborted (core dumped)" with 0.1.97:

(test) mep@evodeep:~/gretel-synthetics$ pip install sentencepiece==0.1.97
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: sentencepiece==0.1.97 in /home/mep/anaconda3/envs/test/lib/python3.6/site-packages (0.1.97)
(test) mep@evodeep:~/gretel-synthetics$ pytest tests/test_tokenizers.py 
================================================== test session starts ===================================================
platform linux -- Python 3.6.13, pytest-7.0.1, pluggy-1.0.0
rootdir: /home/mep/gretel-synthetics
plugins: cov-4.0.0
collected 4 items                                                                                                        

tests/test_tokenizers.py ..Fatal Python error: Aborted

Current thread 0x00007f2621dde340 (most recent call first):
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/sentencepiece/__init__.py", line 927 in _TrainFromMap
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/sentencepiece/__init__.py", line 982 in _Train
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/sentencepiece/__init__.py", line 989 in Train
  File "/home/mep/gretel-synthetics/tests/test_tokenizers.py", line 112 in test_raw_sp
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/python.py", line 192 in pytest_pyfunc_call
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/python.py", line 1718 in runtest
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 168 in pytest_runtest_call
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 261 in <lambda>
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 340 in from_call
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 261 in call_runtest_hook
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 221 in call_and_report
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 132 in runtestprotocol
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/main.py", line 347 in pytest_runtestloop
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/main.py", line 322 in _main
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/main.py", line 268 in wrap_session
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/main.py", line 315 in pytest_cmdline_main
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/config/__init__.py", line 166 in main
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/config/__init__.py", line 188 in console_main
  File "/home/mep/anaconda3/envs/test/bin/pytest", line 8 in <module>
Aborted (core dumped)

Next try 0.1.91 - no other commands run between these two tries. Passes.

(test) mep@evodeep:~/gretel-synthetics$ pip install sentencepiece==0.1.91
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting sentencepiece==0.1.91
  Downloading sentencepiece-0.1.91-cp36-cp36m-manylinux1_x86_64.whl (1.1 MB)
     |████████████████████████████████| 1.1 MB 5.0 MB/s 
Installing collected packages: sentencepiece
  Attempting uninstall: sentencepiece
    Found existing installation: sentencepiece 0.1.97
    Uninstalling sentencepiece-0.1.97:
      Successfully uninstalled sentencepiece-0.1.97
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gretel-synthetics 0.16.13.dev3+g7e73a31.d20230215 requires sentencepiece==0.1.96, but you have sentencepiece 0.1.91 which is incompatible.
Successfully installed sentencepiece-0.1.91
(test) mep@evodeep:~/gretel-synthetics$ pytest tests/test_tokenizers.py 
================================================== test session starts ===================================================
platform linux -- Python 3.6.13, pytest-7.0.1, pluggy-1.0.0
rootdir: /home/mep/gretel-synthetics
plugins: cov-4.0.0
collected 4 items                                                                                                        

tests/test_tokenizers.py ....                                                                                      [100%]

==================================================== warnings summary ====================================================
../anaconda3/envs/test/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py:22
  /home/mep/anaconda3/envs/test/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py:22: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================== 4 passed, 1 warning in 0.05s ==============================================

Next try 0.1.92 - also passes.

(test) mep@evodeep:~/gretel-synthetics$ pip install sentencepiece==0.1.92
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting sentencepiece==0.1.92
  Downloading sentencepiece-0.1.92-cp36-cp36m-manylinux1_x86_64.whl (1.2 MB)
     |████████████████████████████████| 1.2 MB 4.2 MB/s 
WARNING: The candidate selected for download or install is a yanked version: 'sentencepiece' candidate (version 0.1.92 at https://files.pythonhosted.org/packages/68/e5/0366f50a00db181f4b7f3bdc408fc7c4177657f5bf45cb799b79fb4ce15c/sentencepiece-0.1.92-cp36-cp36m-manylinux1_x86_64.whl#sha256=7fd16c761339f593596b63e50810a2d2eff964d428ab79a49674c7371c055561 (from https://pypi.org/simple/sentencepiece/))
Reason for being yanked: Crash bug is reported (confirming)
Installing collected packages: sentencepiece
  Attempting uninstall: sentencepiece
    Found existing installation: sentencepiece 0.1.91
    Uninstalling sentencepiece-0.1.91:
      Successfully uninstalled sentencepiece-0.1.91
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gretel-synthetics 0.16.13.dev3+g7e73a31.d20230215 requires sentencepiece==0.1.96, but you have sentencepiece 0.1.92 which is incompatible.
Successfully installed sentencepiece-0.1.92
(test) mep@evodeep:~/gretel-synthetics$ pytest tests/test_tokenizers.py 
================================================== test session starts ===================================================
platform linux -- Python 3.6.13, pytest-7.0.1, pluggy-1.0.0
rootdir: /home/mep/gretel-synthetics
plugins: cov-4.0.0
collected 4 items                                                                                                        

tests/test_tokenizers.py ....                                                                                      [100%]

==================================================== warnings summary ====================================================
../anaconda3/envs/test/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py:22
  /home/mep/anaconda3/envs/test/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py:22: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================== 4 passed, 1 warning in 0.04s ==============================================
 

Next try 0.1.94 (there is no 0.1.93) - "Segmentation fault (core dumped)":

(test) mep@evodeep:~/gretel-synthetics$ pip install sentencepiece==0.1.93
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
ERROR: Could not find a version that satisfies the requirement sentencepiece==0.1.93 (from versions: 0.0.0, 0.0.2, 0.0.3, 0.0.4, 0.0.5, 0.0.6, 0.0.7, 0.0.9, 0.1.0, 0.1.1, 0.1.2, 0.1.3, 0.1.4, 0.1.5, 0.1.6, 0.1.7, 0.1.8, 0.1.81, 0.1.82, 0.1.83, 0.1.85, 0.1.86, 0.1.90, 0.1.91, 0.1.92, 0.1.94, 0.1.95, 0.1.96, 0.1.97)
ERROR: No matching distribution found for sentencepiece==0.1.93
(test) mep@evodeep:~/gretel-synthetics$ pip install sentencepiece==0.1.94
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting sentencepiece==0.1.94
  Downloading sentencepiece-0.1.94-cp36-cp36m-manylinux2014_x86_64.whl (1.1 MB)
     |████████████████████████████████| 1.1 MB 5.4 MB/s 
Installing collected packages: sentencepiece
  Attempting uninstall: sentencepiece
    Found existing installation: sentencepiece 0.1.92
    Uninstalling sentencepiece-0.1.92:
      Successfully uninstalled sentencepiece-0.1.92
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gretel-synthetics 0.16.13.dev3+g7e73a31.d20230215 requires sentencepiece==0.1.96, but you have sentencepiece 0.1.94 which is incompatible.
Successfully installed sentencepiece-0.1.94
(test) mep@evodeep:~/gretel-synthetics$ pytest tests/test_tokenizers.py 
================================================== test session starts ===================================================
platform linux -- Python 3.6.13, pytest-7.0.1, pluggy-1.0.0
rootdir: /home/mep/gretel-synthetics
plugins: cov-4.0.0
collected 4 items                                                                                                        

tests/test_tokenizers.py ..Fatal Python error: Segmentation fault

Current thread 0x00007f0814bde340 (most recent call first):
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/sentencepiece/__init__.py", line 389 in _TrainFromMap
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/sentencepiece/__init__.py", line 444 in Train
  File "/home/mep/gretel-synthetics/tests/test_tokenizers.py", line 112 in test_raw_sp
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/python.py", line 192 in pytest_pyfunc_call
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/python.py", line 1718 in runtest
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 168 in pytest_runtest_call
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 261 in <lambda>
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 340 in from_call
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 261 in call_runtest_hook
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 221 in call_and_report
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 132 in runtestprotocol
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/main.py", line 347 in pytest_runtestloop
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/main.py", line 322 in _main
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/main.py", line 268 in wrap_session
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/main.py", line 315 in pytest_cmdline_main
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/config/__init__.py", line 166 in main
  File "/home/mep/anaconda3/envs/test/lib/python3.6/site-packages/_pytest/config/__init__.py", line 188 in console_main
  File "/home/mep/anaconda3/envs/test/bin/pytest", line 8 in <module>
Segmentation fault (core dumped)

Back to 0.1.91 - passes

(test) mep@evodeep:~/gretel-synthetics$ pip install sentencepiece==0.1.91
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting sentencepiece==0.1.91
  Downloading sentencepiece-0.1.91-cp36-cp36m-manylinux1_x86_64.whl (1.1 MB)
     |████████████████████████████████| 1.1 MB 3.1 MB/s 
Installing collected packages: sentencepiece
  Attempting uninstall: sentencepiece
    Found existing installation: sentencepiece 0.1.94
    Uninstalling sentencepiece-0.1.94:
      Successfully uninstalled sentencepiece-0.1.94
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gretel-synthetics 0.16.13.dev3+g7e73a31.d20230215 requires sentencepiece==0.1.96, but you have sentencepiece 0.1.91 which is incompatible.
Successfully installed sentencepiece-0.1.91
(test) mep@evodeep:~/gretel-synthetics$ pytest tests/test_tokenizers.py 
================================================== test session starts ===================================================
platform linux -- Python 3.6.13, pytest-7.0.1, pluggy-1.0.0
rootdir: /home/mep/gretel-synthetics
plugins: cov-4.0.0
collected 4 items                                                                                                        

tests/test_tokenizers.py ....                                                                                      [100%]

==================================================== warnings summary ====================================================
../anaconda3/envs/test/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py:22
  /home/mep/anaconda3/envs/test/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py:22: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================== 4 passed, 1 warning in 0.04s ==============================================

Finally, here is my pip freeze:

(test) mep@evodeep:~/gretel-synthetics$ pip freeze
absl-py==0.15.0
astroid==2.11.7
astunparse==1.6.3
attrs==22.2.0
boto3==1.23.10
botocore==1.26.10
cached-property==1.5.2
cachetools==4.2.4
certifi==2021.5.30
charset-normalizer==2.0.12
clang==5.0
cloudpickle==2.2.1
coverage==6.2
dataclasses==0.7
dill==0.3.4
dm-tree==0.1.8
flake8==5.0.4
flatbuffers==1.12
gast==0.3.3
google-auth==1.35.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
-e git+https://github.com/gretelai/gretel-synthetics.git@7e73a3119e1f3cfb926918b81761c013e0d0d0d1#egg=gretel_synthetics
grpcio==1.32.0
h5py==2.10.0
idna==3.4
importlib-metadata==4.8.3
importlib-resources==5.4.0
iniconfig==1.1.1
isort==5.10.1
jmespath==0.10.0
keras==2.6.0
Keras-Preprocessing==1.1.2
lazy-object-proxy==1.7.1
loky==2.8.0
Markdown==3.3.7
mccabe==0.7.0
mpmath==1.2.1
numpy==1.19.5
oauthlib==3.2.2
opt-einsum==3.3.0
packaging==21.3
pandas==1.1.5
platformdirs==2.4.0
pluggy==1.0.0
protobuf==3.19.6
py==1.11.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycodestyle==2.9.1
pyflakes==2.5.0
pylint==2.13.9
pyparsing==3.0.9
pytest==7.0.1
pytest-cov==4.0.0
python-dateutil==2.8.2
pytz==2022.7.1
requests==2.27.1
requests-oauthlib==1.3.1
rsa==4.9
s3transfer==0.5.2
scipy==1.5.4
sentencepiece==0.1.91
six==1.15.0
smart-open==2.2.1
tensorboard==2.6.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.4.0
tensorflow-estimator==2.4.0
tensorflow-privacy==0.5.1
termcolor==1.1.0
tomli==1.2.3
tqdm==4.64.1
typed-ast==1.5.4
typing-extensions==3.7.4.3
urllib3==1.26.14
Werkzeug==2.0.3
wrapt==1.12.1
zipp==3.6.0

So something broke between 0.1.92 and 0.1.94.

Hope that is useful, thanks for making sentencepiece!

mepster avatar Feb 15 '23 03:02 mepster