Performance Issue Generating Document Embedding Vectors with universal-sentence-encoder_4
What happened?
I am using Top2Vec to summarize document corpuses. Top2vec can utilize a variety of different document embedding methods. With universal-sentence-encoder_4, is very slow. Is this due to generating the embedding vectors in Tensorflow and then converting back to python/numpy?
Here is the relevant code:
document_vectors.append(self.embed(train_corpus[current:current + batch_size]))
I posted the question on StackOverflow but there hasn't been any response.
https://stackoverflow.com/questions/71809942/performance-issue-with-tensoflow-hub-and-universal-sentence-encoder
Relevant code
import cProfile, pstats, io
from pstats import SortKey
pr = cProfile.Profile()
pr.enable()
for ind in range(0, batches):
document_vectors.append(self.embed(train_corpus[current:current + batch_size]))
current += batch_size
if extra > 0:
document_vectors.append(self.embed(train_corpus[current:current + extra]))
document_vectors = self._l2_normalize(np.array(np.vstack(document_vectors)))
pr.disable()
s = io.StringIO()
sortby = SortKey.CUMULATIVE
ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
ps.print_stats()
print(s.getvalue())
Relevant log output
3638257 function calls (3568425 primitive calls) in 154.779 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
6983 0.015 0.000 152.706 0.022 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py:743(_call_attribute)
6983 0.019 0.000 152.692 0.022 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py:138(error_handler)
6983 0.129 0.000 152.643 0.022 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py:887(__call__)
6983 0.086 0.000 152.281 0.022 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py:934(_call)
6983 0.068 0.000 152.188 0.022 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:2951(__call__)
6983 0.081 0.000 149.721 0.021 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:1782(_call_flat)
6983 0.205 0.000 149.416 0.021 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:454(call)
6983 0.034 0.000 149.041 0.021 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/execute.py:29(quick_execute)
6983 148.998 0.021 148.998 0.021 {built-in method tensorflow.python._pywrap_tfe.TFE_Py_Execute}
6983 0.048 0.000 1.928 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:3219(_maybe_define_function)
6/4 0.148 0.025 1.925 0.481 {built-in method numpy.core._multiarray_umath.implement_array_function}
1 0.000 0.000 1.882 1.882 <__array_function__ internals>:2(vstack)
1 0.000 0.000 1.877 1.877 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/numpy/core/shape_base.py:223(vstack)
6987 1.833 0.000 1.833 0.000 {built-in method numpy.array}
1 0.000 0.000 1.736 1.736 <__array_function__ internals>:2(atleast_2d)
1 0.015 0.015 1.734 1.734 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/numpy/core/shape_base.py:82(atleast_2d)
6984 0.004 0.000 1.718 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/numpy/core/_asarray.py:110(asanyarray)
6983 0.046 0.000 1.381 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:2648(canonicalize_function_inputs)
6983 0.080 0.000 1.311 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:2817(_convert_inputs_to_signature)
6983 0.009 0.000 0.652 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/profiler/trace.py:178(wrapped)
6983 0.076 0.000 0.643 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:1636(convert_to_tensor)
6983 0.015 0.000 0.472 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:2051(captured_inputs)
6983 0.005 0.000 0.405 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py:340(_constant_tensor_conversion_function)
6983 0.007 0.000 0.400 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py:170(constant)
tensorflow_hub Version
0.12.0 (latest stable release)
TensorFlow Version
2.8 (latest stable release)
Other libraries
% conda list tensorflow
packages in environment at /Users/davidlaxer/anaconda3:
Name Version Build Channel
mesh-tensorflow 0.1.19 pypi_0 pypi tensorflow 2.8.0 pypi_0 pypi tensorflow-datasets 4.3.0 pypi_0 pypi tensorflow-estimator 2.6.0 pypi_0 pypi tensorflow-hub 0.12.0 pypi_0 pypi tensorflow-io-gcs-filesystem 0.21.0 pypi_0 pypi tensorflow-metadata 1.0.0 pypi_0 pypi tensorflow-text 2.5.0rc0 pypi_0 pypi
% pip show tensorflow_metal WARNING: Ignoring invalid distribution -umpy (/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages) Name: tensorflow-metal Version: 0.4.0 Summary: TensorFlow acceleration for Mac GPUs. Home-page: https://developer.apple.com/metal/tensorflow-plugin/ Author: Author-email: License: MIT License. Copyright © 2020-2021 Apple Inc. All rights reserved. Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages Requires: six, wheel Required-by: (tensorflow-metal) (base) davidlaxer@x86_64-apple-darwin13 top2vec % pip show tensorflow_macos WARNING: Ignoring invalid distribution -umpy (/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages) Name: tensorflow-macos Version: 2.8.0 Summary: TensorFlow is an open source machine learning framework for everyone. Home-page: https://www.tensorflow.org/ Author: Google Inc. Author-email: [email protected] License: Apache 2.0 Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages Requires: absl-py, astunparse, flatbuffers, gast, google-pasta, grpcio, h5py, keras, keras-preprocessing, libclang, numpy, opt-einsum, protobuf, setuptools, six, tensorboard, termcolor, tf-estimator-nightly, typing-extensions, wrapt Required-by: (tensorflow-metal) (base) davidlaxer@x86_64-apple-darwin13 top2vec % pip show tensorflow_hub WARNING: Ignoring invalid distribution -umpy (/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages) Name: tensorflow-hub Version: 0.12.0 Summary: TensorFlow Hub is a library to foster the publication, discovery, and consumption of reusable parts of machine learning models. Home-page: https://github.com/tensorflow/hub Author: Google LLC Author-email: [email protected] License: Apache 2.0 Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages Requires: numpy, protobuf Required-by: tensorflow-gan, tensorflow-text (tensorflow-metal) (base) davidlaxer@x86_64-apple-darwin13 top2vec % pip show tensorflow_text WARNING: Ignoring invalid distribution -umpy (/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages) Name: tensorflow-text Version: 2.7.3 Summary: TF.Text is a TensorFlow library of text related ops, modules, and subgraphs. Home-page: http://github.com/tensorflow/text Author: Google Inc. Author-email: [email protected] License: Apache 2.0 Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages Requires: tensorflow-hub Required-by:
Python Version
3.x
OS
macOS
Hey mate,
I have the same loss in performance due to the same built in function. Have you find any way to solve this issue?
Thanks
I haven’t focused on this issue lately. The cProfile suggests it’s something in Tensorflow ‘eager’ execution, possible generating the embedding vectors.
I posted the question on StackOverflow and nobody responded.
https://stackoverflow.com/questions/71809942/performance-issue-with-tensoflow-hub-and-universal-sentence-encoder
On Aug 4, 2022, at 1:01 PM, doric35 @.***> wrote:
Hey mate,
I have the same loss in performance due to the same built in function. Have you find any way to solve this issue?
Thanks
— Reply to this email directly, view it on GitHub https://github.com/tensorflow/hub/issues/848#issuecomment-1205711405, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXWFW7BUEWVOAEPCSYVPEDVXQOSHANCNFSM5TUEZ5WA. You are receiving this because you authored the thread.
Please forward the issue to https://github.com/tensorflow/tensorflow if eager execution might have something to do with the problem.