hub icon indicating copy to clipboard operation
hub copied to clipboard

Performance Issue Generating Document Embedding Vectors with universal-sentence-encoder_4

Open dbl001 opened this issue 3 years ago • 2 comments

What happened?

I am using Top2Vec to summarize document corpuses. Top2vec can utilize a variety of different document embedding methods. With universal-sentence-encoder_4, is very slow. Is this due to generating the embedding vectors in Tensorflow and then converting back to python/numpy?

Here is the relevant code:

                document_vectors.append(self.embed(train_corpus[current:current + batch_size]))

I posted the question on StackOverflow but there hasn't been any response.

https://stackoverflow.com/questions/71809942/performance-issue-with-tensoflow-hub-and-universal-sentence-encoder

Relevant code

import cProfile, pstats, io
            from pstats import SortKey
            pr = cProfile.Profile()
            pr.enable()

            for ind in range(0, batches):
                document_vectors.append(self.embed(train_corpus[current:current + batch_size]))
                current += batch_size

            if extra > 0:
                document_vectors.append(self.embed(train_corpus[current:current + extra]))

            document_vectors = self._l2_normalize(np.array(np.vstack(document_vectors)))
            
            pr.disable()
            s = io.StringIO()
            sortby = SortKey.CUMULATIVE
            ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
            ps.print_stats()
            print(s.getvalue())

Relevant log output

3638257 function calls (3568425 primitive calls) in 154.779 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     6983    0.015    0.000  152.706    0.022 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py:743(_call_attribute)
     6983    0.019    0.000  152.692    0.022 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py:138(error_handler)
     6983    0.129    0.000  152.643    0.022 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py:887(__call__)
     6983    0.086    0.000  152.281    0.022 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py:934(_call)
     6983    0.068    0.000  152.188    0.022 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:2951(__call__)
     6983    0.081    0.000  149.721    0.021 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:1782(_call_flat)
     6983    0.205    0.000  149.416    0.021 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:454(call)
     6983    0.034    0.000  149.041    0.021 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/execute.py:29(quick_execute)
     6983  148.998    0.021  148.998    0.021 {built-in method tensorflow.python._pywrap_tfe.TFE_Py_Execute}
     6983    0.048    0.000    1.928    0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:3219(_maybe_define_function)
      6/4    0.148    0.025    1.925    0.481 {built-in method numpy.core._multiarray_umath.implement_array_function}
        1    0.000    0.000    1.882    1.882 <__array_function__ internals>:2(vstack)
        1    0.000    0.000    1.877    1.877 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/numpy/core/shape_base.py:223(vstack)
     6987    1.833    0.000    1.833    0.000 {built-in method numpy.array}
        1    0.000    0.000    1.736    1.736 <__array_function__ internals>:2(atleast_2d)
        1    0.015    0.015    1.734    1.734 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/numpy/core/shape_base.py:82(atleast_2d)
     6984    0.004    0.000    1.718    0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/numpy/core/_asarray.py:110(asanyarray)
     6983    0.046    0.000    1.381    0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:2648(canonicalize_function_inputs)
     6983    0.080    0.000    1.311    0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:2817(_convert_inputs_to_signature)
     6983    0.009    0.000    0.652    0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/profiler/trace.py:178(wrapped)
     6983    0.076    0.000    0.643    0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:1636(convert_to_tensor)
     6983    0.015    0.000    0.472    0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:2051(captured_inputs)
     6983    0.005    0.000    0.405    0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py:340(_constant_tensor_conversion_function)
     6983    0.007    0.000    0.400    0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py:170(constant)

tensorflow_hub Version

0.12.0 (latest stable release)

TensorFlow Version

2.8 (latest stable release)

Other libraries

% conda list tensorflow

packages in environment at /Users/davidlaxer/anaconda3:

Name Version Build Channel

mesh-tensorflow 0.1.19 pypi_0 pypi tensorflow 2.8.0 pypi_0 pypi tensorflow-datasets 4.3.0 pypi_0 pypi tensorflow-estimator 2.6.0 pypi_0 pypi tensorflow-hub 0.12.0 pypi_0 pypi tensorflow-io-gcs-filesystem 0.21.0 pypi_0 pypi tensorflow-metadata 1.0.0 pypi_0 pypi tensorflow-text 2.5.0rc0 pypi_0 pypi

% pip show tensorflow_metal WARNING: Ignoring invalid distribution -umpy (/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages) Name: tensorflow-metal Version: 0.4.0 Summary: TensorFlow acceleration for Mac GPUs. Home-page: https://developer.apple.com/metal/tensorflow-plugin/ Author: Author-email: License: MIT License. Copyright © 2020-2021 Apple Inc. All rights reserved. Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages Requires: six, wheel Required-by: (tensorflow-metal) (base) davidlaxer@x86_64-apple-darwin13 top2vec % pip show tensorflow_macos WARNING: Ignoring invalid distribution -umpy (/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages) Name: tensorflow-macos Version: 2.8.0 Summary: TensorFlow is an open source machine learning framework for everyone. Home-page: https://www.tensorflow.org/ Author: Google Inc. Author-email: [email protected] License: Apache 2.0 Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages Requires: absl-py, astunparse, flatbuffers, gast, google-pasta, grpcio, h5py, keras, keras-preprocessing, libclang, numpy, opt-einsum, protobuf, setuptools, six, tensorboard, termcolor, tf-estimator-nightly, typing-extensions, wrapt Required-by: (tensorflow-metal) (base) davidlaxer@x86_64-apple-darwin13 top2vec % pip show tensorflow_hub WARNING: Ignoring invalid distribution -umpy (/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages) Name: tensorflow-hub Version: 0.12.0 Summary: TensorFlow Hub is a library to foster the publication, discovery, and consumption of reusable parts of machine learning models. Home-page: https://github.com/tensorflow/hub Author: Google LLC Author-email: [email protected] License: Apache 2.0 Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages Requires: numpy, protobuf Required-by: tensorflow-gan, tensorflow-text (tensorflow-metal) (base) davidlaxer@x86_64-apple-darwin13 top2vec % pip show tensorflow_text WARNING: Ignoring invalid distribution -umpy (/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages) Name: tensorflow-text Version: 2.7.3 Summary: TF.Text is a TensorFlow library of text related ops, modules, and subgraphs. Home-page: http://github.com/tensorflow/text Author: Google Inc. Author-email: [email protected] License: Apache 2.0 Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages Requires: tensorflow-hub Required-by:

Python Version

3.x

OS

macOS

dbl001 avatar Apr 17 '22 17:04 dbl001

Hey mate,

I have the same loss in performance due to the same built in function. Have you find any way to solve this issue?

Thanks

doric35 avatar Aug 04 '22 20:08 doric35

I haven’t focused on this issue lately. The cProfile suggests it’s something in Tensorflow ‘eager’ execution, possible generating the embedding vectors.

I posted the question on StackOverflow and nobody responded.

https://stackoverflow.com/questions/71809942/performance-issue-with-tensoflow-hub-and-universal-sentence-encoder

On Aug 4, 2022, at 1:01 PM, doric35 @.***> wrote:

Hey mate,

I have the same loss in performance due to the same built in function. Have you find any way to solve this issue?

Thanks

— Reply to this email directly, view it on GitHub https://github.com/tensorflow/hub/issues/848#issuecomment-1205711405, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXWFW7BUEWVOAEPCSYVPEDVXQOSHANCNFSM5TUEZ5WA. You are receiving this because you authored the thread.

dbl001 avatar Aug 05 '22 19:08 dbl001

Please forward the issue to https://github.com/tensorflow/tensorflow if eager execution might have something to do with the problem.

WGierke avatar Oct 24 '22 11:10 WGierke