ontology.py: saving unsanitized words
Describe the bug Ontology generation fails when given raw strings with slashes, i.e "functional/adaptive interpretation". This will cause issues in L260, because the word is not stripped of slashes, which causes path problems.
To Reproduce Steps to reproduce the behavior:
- Have an ontology with words containing slashes.
- Run Generate.
Expected behavior Words are sanitized before saving their embeddings.
Orange version: 3.33.dev
Text add-on version: 1.7.0
Additional context
------------------------- FileNotFoundError Exception -------------------------
Traceback (most recent call last):
File "/Users/ajda/orange/orange3/Orange/widgets/utils/concurrent.py", line 591, in _on_task_done
super()._on_task_done(future)
File "/Users/ajda/orange/orange3/Orange/widgets/utils/concurrent.py", line 547, in _on_task_done
self.on_exception(ex)
File "/Users/ajda/orange/orange3-text/orangecontrib/text/widgets/owontology.py", line 774, in on_exception
raise ex
File "/Users/ajda/.pyenv-x86/versions/3.9.10/lib/python3.9/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/ajda/orange/orange3-text/orangecontrib/text/widgets/owontology.py", line 40, in _run
return handler(*args, callback=callback)
File "/Users/ajda/orange/orange3-text/orangecontrib/text/ontology.py", line 289, in generate
self._get_embeddings(words, wrap_callback(callback, end=0.1)),
File "/Users/ajda/orange/orange3-text/orangecontrib/text/ontology.py", line 371, in _get_embeddings
self.storage.save_embedding(words[i], embeddings[i, :])
File "/Users/ajda/orange/orange3-text/orangecontrib/text/ontology.py", line 260, in save_embedding
np.save(os.path.join(self.cache_dir, f'{word}.npy'), emb)
File "<__array_function__ internals>", line 180, in save
File "/Users/ajda/.pyenv-x86/versions/py3.9/lib/python3.9/site-packages/numpy/lib/npyio.py", line 515, in save
file_ctx = open(file, "wb")
FileNotFoundError: [Errno 2] No such file or directory: '/Users/ajda/Library/Caches/Orange/3.33.0.dev/ontology/182 funkcionalna/adaptacijska interpretacija.npy'
-------------------------------------------------------------------------------
The current caching system could be changed to something other than making separate files for each word. Also, if it is not a LRU cache with a limited size, there should probably be a way to clear the cache if/when it gets big (currently it is cleared just when running tests?). @PrimozGodec said he can coordinate this with @djukicn. We can also discuss it together.