ontology.py: saving unsanitized words

Open ajdapretnar opened this issue 3 years ago • 1 comments

Describe the bug Ontology generation fails when given raw strings with slashes, i.e "functional/adaptive interpretation". This will cause issues in L260, because the word is not stripped of slashes, which causes path problems.

To Reproduce Steps to reproduce the behavior:

Have an ontology with words containing slashes.
Run Generate.

Expected behavior Words are sanitized before saving their embeddings.

Orange version: 3.33.dev

Text add-on version: 1.7.0

Additional context

------------------------- FileNotFoundError Exception -------------------------
Traceback (most recent call last):
  File "/Users/ajda/orange/orange3/Orange/widgets/utils/concurrent.py", line 591, in _on_task_done
    super()._on_task_done(future)
  File "/Users/ajda/orange/orange3/Orange/widgets/utils/concurrent.py", line 547, in _on_task_done
    self.on_exception(ex)
  File "/Users/ajda/orange/orange3-text/orangecontrib/text/widgets/owontology.py", line 774, in on_exception
    raise ex
  File "/Users/ajda/.pyenv-x86/versions/3.9.10/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/ajda/orange/orange3-text/orangecontrib/text/widgets/owontology.py", line 40, in _run
    return handler(*args, callback=callback)
  File "/Users/ajda/orange/orange3-text/orangecontrib/text/ontology.py", line 289, in generate
    self._get_embeddings(words, wrap_callback(callback, end=0.1)),
  File "/Users/ajda/orange/orange3-text/orangecontrib/text/ontology.py", line 371, in _get_embeddings
    self.storage.save_embedding(words[i], embeddings[i, :])
  File "/Users/ajda/orange/orange3-text/orangecontrib/text/ontology.py", line 260, in save_embedding
    np.save(os.path.join(self.cache_dir, f'{word}.npy'), emb)
  File "<__array_function__ internals>", line 180, in save
  File "/Users/ajda/.pyenv-x86/versions/py3.9/lib/python3.9/site-packages/numpy/lib/npyio.py", line 515, in save
    file_ctx = open(file, "wb")
FileNotFoundError: [Errno 2] No such file or directory: '/Users/ajda/Library/Caches/Orange/3.33.0.dev/ontology/182 funkcionalna/adaptacijska interpretacija.npy'
-------------------------------------------------------------------------------

Apr 21 '22 15:04 ajdapretnar

The current caching system could be changed to something other than making separate files for each word. Also, if it is not a LRU cache with a limited size, there should probably be a way to clear the cache if/when it gets big (currently it is cleared just when running tests?). @PrimozGodec said he can coordinate this with @djukicn. We can also discuss it together.

Apr 25 '22 08:04 lanzagar