Nebullvm model optimization integration
Hello, This PR introduces the integration of nebullvm, a model optimization library that enables significant acceleration in model inference. The library has been integrated for transformer models (TransformerDocumentEmbedding and TransformerWordEmbedding) following the code style of the existing Onnx export support.
The PR also features a tutorial on how to leverage the optimization.
Moreover, this PR fixes a bug (#2930) that prevented the correct export of the model to Onnx.
Example Usage
from flair.data import Sentence
from flair.models import SequenceTagger
# Load model
model = SequenceTagger.load("ner-large")
# Define some example sentences
sentences = [
Sentence("Mars is the fourth planet from the Sun and the second-smallest planet in the Solar System."),
Sentence("In the fourth century BCE, Aristotle noted that Mars disappeared behind the Moon during an occultation."),
Sentence("Liquid water cannot exist on the surface of Mars due to low atmospheric pressure."),
Sentence("In 2004, Opportunity detected the mineral jarosite."),
]
# Optimize with nebullvm
model.embeddings = model.embeddings.optimize_nebullvm(sentences)
# Inference
sentence = Sentence('George Washington went to Washington.')
model.predict(sentence)
Results
With nebullvm the inference speed of the mode can be significantly improved, with the model used in the example we found the following results:
| Machine Type | Baseline (s) | Nebullvm - optimized (s) | Speedup |
|---|---|---|---|
| M1 | 0.181 | 0.0358 | 5.1x |
| Intel CPU | 0.206 | 0.0953 | 2,2x |
| GPU (Tesla T4) | 0.0266 | 0.0129 | 2.1x |
Hello @valeriosofi thanks a lot for adding this, many users will surely find this useful! The unit tests are failing though, looks like a deprecated method call. Can you take a look?
Hello @alanakbik, thanks I managed to fix the problem! Now the unit tests should be ok.
Hi @alanakbik, I tested again nebullvm and it works quite well and on my local branch it passes all the tests. As soon as you manage to check it, let me know if you need anything else from our side.
I'm getting this error when running import nebullvm:
/opt/conda/envs/tech_ner/lib/python3.9/site-packages/nebullvm/inference_learners/deepsparse.py:32: UserWarning: No deepsparse installation found. Trying to install it...
warnings.warn(
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
File /opt/conda/envs/tech_ner/lib/python3.9/site-packages/nebullvm/inference_learners/deepsparse.py:22
21 try:
---> 22 from deepsparse import compile_model, cpu
23 except ImportError:
ModuleNotFoundError: No module named 'deepsparse'
During handling of the above exception, another exception occurred:
FileNotFoundError Traceback (most recent call last)
Cell In [8], line 2
1 # %%
----> 2 import nebullvm
File /opt/conda/envs/tech_ner/lib/python3.9/site-packages/nebullvm/__init__.py:1
----> 1 from nebullvm.api.frontend.torch import optimize_torch_model # noqa F401
2 from nebullvm.api.frontend.tf import optimize_tf_model # noqa F401
3 from nebullvm.api.frontend.onnx import optimize_onnx_model # noqa F401
File /opt/conda/envs/tech_ner/lib/python3.9/site-packages/nebullvm/api/frontend/torch.py:27
19 from nebullvm.base import (
20 DeepLearningFramework,
21 ModelParams,
(...)
24 QuantizationType,
25 )
26 from nebullvm.converters import ONNXConverter
---> 27 from nebullvm.optimizers.pytorch import PytorchBackendOptimizer
28 from nebullvm.transformations.base import MultiStageTransformation
29 from nebullvm.utils.data import DataManager
File /opt/conda/envs/tech_ner/lib/python3.9/site-packages/nebullvm/optimizers/__init__.py:6
4 from nebullvm.optimizers.base import BaseOptimizer # noqa F401
5 from nebullvm.optimizers.blade_disc import BladeDISCOptimizer # noqa F401
----> 6 from nebullvm.optimizers.deepsparse import DeepSparseOptimizer # noqa F401
7 from nebullvm.optimizers.neural_compressor import (
8 NeuralCompressorOptimizer,
9 ) # noqa F401
10 from nebullvm.optimizers.onnx import ONNXOptimizer # noqa F401
File /opt/conda/envs/tech_ner/lib/python3.9/site-packages/nebullvm/optimizers/deepsparse.py:11
9 from nebullvm.config import CONSTRAINED_METRIC_DROP_THS
10 from nebullvm.converters import ONNXConverter
---> 11 from nebullvm.inference_learners.deepsparse import (
12 DEEPSPARSE_INFERENCE_LEARNERS,
13 DeepSparseInferenceLearner,
14 )
15 from nebullvm.measure import compute_relative_difference
16 from nebullvm.optimizers import BaseOptimizer
File /opt/conda/envs/tech_ner/lib/python3.9/site-packages/nebullvm/inference_learners/deepsparse.py:35
27 if (
28 os_ != "Darwin"
29 and get_cpu_arch() != "arm"
30 and not NO_COMPILER_INSTALLATION
31 ):
32 warnings.warn(
33 "No deepsparse installation found. Trying to install it..."
34 )
---> 35 install_deepsparse()
36 from deepsparse import compile_model, cpu
37 else:
File /opt/conda/envs/tech_ner/lib/python3.9/site-packages/nebullvm/installers/installers.py:188, in install_deepsparse()
185 python_minor_version = sys.version_info.minor
187 cmd = ["apt-get", "install", f"python3.{python_minor_version}-venv"]
--> 188 subprocess.run(cmd)
190 cmd = ["pip3", "install", "deepsparse"]
191 subprocess.run(cmd)
File /opt/conda/envs/tech_ner/lib/python3.9/subprocess.py:505, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
502 kwargs['stdout'] = PIPE
503 kwargs['stderr'] = PIPE
--> 505 with Popen(*popenargs, **kwargs) as process:
506 try:
507 stdout, stderr = process.communicate(input, timeout=timeout)
File /opt/conda/envs/tech_ner/lib/python3.9/subprocess.py:951, in Popen.__init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask)
947 if self.text_mode:
948 self.stderr = io.TextIOWrapper(self.stderr,
949 encoding=encoding, errors=errors)
--> 951 self._execute_child(args, executable, preexec_fn, close_fds,
952 pass_fds, cwd, env,
953 startupinfo, creationflags, shell,
954 p2cread, p2cwrite,
955 c2pread, c2pwrite,
956 errread, errwrite,
957 restore_signals,
958 gid, gids, uid, umask,
959 start_new_session)
960 except:
961 # Cleanup if the child failed starting.
962 for f in filter(None, (self.stdin, self.stdout, self.stderr)):
File /opt/conda/envs/tech_ner/lib/python3.9/subprocess.py:1821, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, gid, gids, uid, umask, start_new_session)
1819 if errno_num != 0:
1820 err_msg = os.strerror(errno_num)
-> 1821 raise child_exception_type(errno_num, err_msg, err_filename)
1822 raise child_exception_type(err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'apt-get'
Hi @klimentij, it looks like you haven't apt-get on your machine! What OS are you using? Does it work if you try to type apt-get update on the terminal?
@valeriosofi Amazon Linux 2, so no apt-get. I'd expect it to use a compatible package manager automatically..
@klimentij Yep we will definitely fix this in the next nebullvm release, thanks for the report ;)
Hello @valeriosofi thanks for submitting this PR.
I ran the above code and it proceeded to immediately install a bunch of libraries on my system - without asking for permission! It installed (at least): Onnx runtime, OpenVino, TensorRT, tvm, deepsparse, neural-compressor. Some of the libraries did not install correctly, I think. You're using subprocess to run these install commands.
I find this very problematic from a security perspective.
After installing all these libraries for 30 minutes, the code failed with:
[ WARNING ] No optimized model has been created. This is likely due to a bug in Nebullvm. Please open an issue and report in details your use case.
Traceback (most recent call last):
File "/home/alan/PycharmProjects/flair/local_nebulum.py", line 25, in <module>
model.predict(sentence)
File "/home/alan/PycharmProjects/flair/flair/models/sequence_tagger_model.py", line 480, in predict
sentence_tensor, lengths = self._prepare_tensors(batch)
File "/home/alan/PycharmProjects/flair/flair/models/sequence_tagger_model.py", line 284, in _prepare_tensors
self.embeddings.embed(sentences)
File "/home/alan/PycharmProjects/flair/flair/embeddings/base.py", line 47, in embed
self._add_embeddings_internal(data_points)
File "/home/alan/PycharmProjects/flair/flair/embeddings/transformer.py", line 543, in _add_embeddings_internal
embeddings = self._forward_tensors(tensors)
File "/home/alan/PycharmProjects/flair/flair/embeddings/transformer.py", line 778, in _forward_tensors
return {"token_embeddings": self.model(*tensors.values())[0]}
TypeError: 'NoneType' object is not callable
Unfortunately, I cannot merge this PR as it seems to not be working, and runs a bunch of auto-installers that completely bloated my system without asking me for permission. Happy to discuss here or via mail.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.