tensorflow icon indicating copy to clipboard operation
tensorflow copied to clipboard

Please bring back native Windows CUDA support!

Open GatGit12 opened this issue 1 year ago • 51 comments

Click to expand!

Issue Type

Others

Have you reproduced the bug with TF nightly?

Yes

Source

binary

Tensorflow Version

2.11

Custom Code

Yes

OS Platform and Distribution

Windows 10

Mobile device

No response

Python version

3.9

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current Behaviour?

I am very disappointed and sad that native Windows CUDA support was simply dropped.

The replacement with WSL2 is not sufficient for processes and systems that are not able to use WSL2 because, for example, if they are also Windows-native applications and the port is too expensive. So we can only use version 2.10 (of both the Python and C APIs) and are stuck with it, which is a shame because it prevents us from benefiting from and participating in new developments. We also see a performance loss of about 5% in WSL2, which leads to higher power consumption and thus has a direct impact on our climate, which can make a big difference in our already very compute-intensive business. In addition, the Windows Direct-ML-Plugin interface is not sufficient, since the performance does not yet reach CUDA and optimizations like XLA and others are not supported. Also, you lose all your highly optimized and expensively developed TF CUDA Custom Ops.

It is also clear that the native CUDA feature on Windows is much needed, see here in the following issues other people are looking for exactly the native CUDA feature on Windows:

  • #https://github.com/tensorflow/tensorflow/issues/58629
  • #https://github.com/tensorflow/tensorflow/issues/58933
  • #https://github.com/tensorflow/tensorflow/issues/59905
  • #https://github.com/tensorflow/tensorflow/issues/59119
  • #https://github.com/tensorflow/tensorflow/issues/59016
  • #https://github.com/tensorflow/tensorflow/issues/58985
  • #https://github.com/tensorflow/tensorflow/issues/58729

All this leads to the simple exclusion and virtual discrimination of a large part of the Tensorflow community that uses CUDA Windows natively.

Why has support been dropped? You could at least keep support for CUDA Windows Native in custom builds. I hope and ask that you bring back Windows native CUDA support and let people decide for themselves if they want to use native CUDA or WSL2.

Thank you for the development of Tensorflow! My favourite DL framework! :)

Standalone code to reproduce the issue

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

Relevant log output

No response

GatGit12 avatar Mar 07 '23 09:03 GatGit12

Hello,

He is totally right and Windows is still used by so many people even if not all developers using Tensorflow use it mainly, so I find it really stupid to drop the native-Windows GPU support when we know that a lot of people still use Windows from the development to the production deployment(even if it's not perfect and I'm not necessarily talking about companies). Even if I can be wrong in what I'm saying because I don't have any actual stats for it, It is unfair to just drop it like this it mean a lot of work will be lost and need a big or massive rebase so initiate a lot of changes and it would not be worth it for a lot of projects. For people that would want to update their project they would be forced to work with a kinda outdated version of Tensorflow.

Note: I'm sorry if I'm wrong on things I'm not a professional or even working in a company but programming is a hobby that I love and I would like to be able to continue it in good conditions and stay up to date for the longest time possible.

Regards.

TheHellTower avatar Mar 07 '23 13:03 TheHellTower

Absolutely precise, multitudes of researchers continue to employ TensorFlow on Windows; please reinstate support for native Windows GPU.

WhiteByeBye avatar Mar 27 '23 08:03 WhiteByeBye

low performance and inconvenience with WSL2 IO, It means that it's hard to apply for large scale dataset via TensorFlow

HeloWong avatar Mar 29 '23 03:03 HeloWong

This change has been made along with the changes in the release for different platforms build with the help of official build collaborators. With the inclusion of WSL will be comparatively easy to maintain the framework for both Linux and Windows. Here is the link to the announcement blog which talks about official build collaborators. Thanks!

sachinprasadhs avatar Apr 04 '23 23:04 sachinprasadhs

It seems that you just repeat the same answer as in all the other issues, without really caring about the question I asked. The problem is not that tensorflow doesn’t support gpus on windows anymore, but that there is no way to use cuda natively on windows. Maybe the framework is easier to build and maintain now, but before version 2.11 it worked fine…

With this change, tensorflow is almost impossible to use on native cuda windows (or be stuck at v2.10.1), for example if you need cuda custom ops or you can’t use wsl2 or you depend on the TF C-API…

Perhaps a native cuda windows custom tensorflow build could be enabled for the users who wish to retain the functionality of the previous versions before 2.11 while also benefiting from the latest updates. That would be a reasonable compromise and a minor step back to native cuda on windows.

I feel like you don’t understand me and you just want to ignore the problem like in the other issues I linked. That makes me very sad and disappointed, because tensorflow used to be my favorite framework. :(

GatGit12 avatar Apr 07 '23 18:04 GatGit12

It seems that you just repeat the same answer as in all the other issues, without really caring about the question I asked. The problem is not that tensorflow doesn’t support gpus on windows anymore, but that there is no way to use cuda natively on windows. Maybe the framework is easier to build and maintain now, but before version 2.11 it worked fine…

With this change, tensorflow is almost impossible to use on native cuda windows (or be stuck at v2.10.1), for example if you need cuda custom ops or you can’t use wsl2 or you depend on the TF C-API…

Perhaps a native cuda windows custom tensorflow build could be enabled for the users who wish to retain the functionality of the previous versions before 2.11 while also benefiting from the latest updates. That would be a reasonable compromise and a minor step back to native cuda on windows.

I feel like you don’t understand me and you just want to ignore the problem like in the other issues I linked. That makes me very sad and disappointed, because tensorflow used to be my favorite framework. :(

I used to love tensorflow too but I feel like they kinda want to kill users that work from development to production on Windows.

Well okay it made it easier to update but at what cost ? You are just cutting of "the legs" of many projects.. That's really unfair in my opinion, If you want quality you always do what it takes to achieve it no ? So why not just continue with cuda support for Windows when it was literally very useful for many people ? Not everything are easy in life and this decition to remove tensorflow really sucks. Don't forget there are even companies that work with Windows from the development to production(even if not a lot) so you are cutting some companies because not everyone want to waste money to pay the developers to rewrite the "whole" project, it's not cheap and also a hard work.

TheHellTower avatar Apr 07 '23 19:04 TheHellTower

In this TensorFlow blog please see section "Expanded GPU support on Windows". Also please see TensorFlow install page and this page for more info. Please feel free to sign up to the mailing list [email protected] to be notified of the most recent updates.

learning-to-play avatar Apr 12 '23 00:04 learning-to-play

I have tried to build TensorFlow 2.12 with CUDA on windows 10. Errors as follows:

C:\Users\tensorflow\Downloads\tensorflow-2.12.0>bazel build --config=opt --define=no_tensorflow_py_deps=true //tensorflow/tools/pip_package:build_pip_package Starting local Bazel server and connecting to it... INFO: Options provided by the client: Inherited 'common' options: --isatty=1 --terminal_columns=157 INFO: Reading rc options for 'build' from c:\users\tensorflow\downloads\tensorflow-2.12.0.bazelrc: Inherited 'common' options: --experimental_repo_remote_exec INFO: Options provided by the client: 'build' options: --python_path=C:/Users/tensorflow/anaconda3/python.exe INFO: Reading rc options for 'build' from c:\users\tensorflow\downloads\tensorflow-2.12.0.bazelrc: 'build' options: --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --experimental_link_static_libraries_once=false --incompatible_enforce_config_setting_visibility INFO: Reading rc options for 'build' from c:\users\tensorflow\downloads\tensorflow-2.12.0.tf_configure.bazelrc: 'build' options: --action_env PYTHON_BIN_PATH=C:/Users/tensorflow/anaconda3/python.exe --action_env PYTHON_LIB_PATH=C:/Users/tensorflow/anaconda3/lib/site-packages --python_path=C:/Users/tensorflow/anaconda3/python.exe --config=tensorrt --action_env CUDA_TOOLKIT_PATH=C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.8 --action_env TF_CUDA_COMPUTE_CAPABILITIES=3.5,5.0,6.1,7.0 --config=cuda --copt=/d2ReducedOptimizeHugeFunctions --host_copt=/d2ReducedOptimizeHugeFunctions --define=override_eigen_strong_inline=true INFO: Reading rc options for 'build' from c:\users\tensorflow\downloads\tensorflow-2.12.0.bazelrc: 'build' options: --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/ir,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_jitrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/graph_executor,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils INFO: Found applicable config definition build:short_logs in file c:\users\tensorflow\downloads\tensorflow-2.12.0.bazelrc: --output_filter=DONT_MATCH_ANYTHING INFO: Found applicable config definition build:v2 in file c:\users\tensorflow\downloads\tensorflow-2.12.0.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1 INFO: Found applicable config definition build:tensorrt in file c:\users\tensorflow\downloads\tensorflow-2.12.0.bazelrc: --repo_env TF_NEED_TENSORRT=1 INFO: Found applicable config definition build:cuda in file c:\users\tensorflow\downloads\tensorflow-2.12.0.bazelrc: --repo_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain --@local_config_cuda//:enable_cuda INFO: Found applicable config definition build:opt in file c:\users\tensorflow\downloads\tensorflow-2.12.0.tf_configure.bazelrc: --copt=/arch:AVX2 --host_copt=/arch:AVX2 INFO: Found applicable config definition build:windows in file c:\users\tensorflow\downloads\tensorflow-2.12.0.bazelrc: --copt=/W0 --host_copt=/W0 --copt=/Zc:__cplusplus --host_copt=/Zc:__cplusplus --copt=/D_USE_MATH_DEFINES --host_copt=/D_USE_MATH_DEFINES --features=compiler_param_file --copt=/d2ReducedOptimizeHugeFunctions --host_copt=/d2ReducedOptimizeHugeFunctions --cxxopt=/std:c++17 --host_cxxopt=/std:c++17 --config=monolithic --copt=-DWIN32_LEAN_AND_MEAN --host_copt=-DWIN32_LEAN_AND_MEAN --copt=-DNOGDI --host_copt=-DNOGDI --copt=/Zc:preprocessor --host_copt=/Zc:preprocessor --linkopt=/DEBUG --host_linkopt=/DEBUG --linkopt=/OPT:REF --host_linkopt=/OPT:REF --linkopt=/OPT:ICF --host_linkopt=/OPT:ICF --verbose_failures --features=compiler_param_file --distinct_host_configuration=false INFO: Found applicable config definition build:monolithic in file c:\users\tensorflow\downloads\tensorflow-2.12.0.bazelrc: --define framework_shared_object=false --define tsl_protobuf_header_only=false --experimental_link_static_libraries_once=false ERROR: C:/users/tensorflow/downloads/tensorflow-2.12.0/tensorflow/compiler/xla/pjrt/BUILD:469:11: in cc_library rule //tensorflow/compiler/xla/pjrt:pjrt_future: target '@tf_runtime//:support' is not visible from target '//tensorflow/compiler/xla/pjrt:pjrt_future'. Check the visibility declaration of the former target if you think the dependency is legitimate ERROR: C:/users/tensorflow/downloads/tensorflow-2.12.0/tensorflow/compiler/xla/pjrt/BUILD:469:11: Analysis of target '//tensorflow/compiler/xla/pjrt:pjrt_future' failed INFO: Repository cudnn_frontend_archive instantiated at: C:/users/tensorflow/downloads/tensorflow-2.12.0/WORKSPACE:15:14: in C:/users/tensorflow/downloads/tensorflow-2.12.0/tensorflow/workspace2.bzl:967:21: in workspace C:/users/tensorflow/downloads/tensorflow-2.12.0/tensorflow/workspace2.bzl:171:20: in _tf_repositories C:/users/tensorflow/downloads/tensorflow-2.12.0/third_party/repo.bzl:136:21: in tf_http_archive Repository rule _tf_http_archive defined at: C:/users/tensorflow/downloads/tensorflow-2.12.0/third_party/repo.bzl:89:35: in INFO: Repository org_sqlite instantiated at: C:/users/tensorflow/downloads/tensorflow-2.12.0/WORKSPACE:15:14: in C:/users/tensorflow/downloads/tensorflow-2.12.0/tensorflow/workspace2.bzl:967:21: in workspace C:/users/tensorflow/downloads/tensorflow-2.12.0/tensorflow/workspace2.bzl:310:20: in _tf_repositories C:/users/tensorflow/downloads/tensorflow-2.12.0/third_party/repo.bzl:136:21: in tf_http_archive Repository rule _tf_http_archive defined at: C:/users/tensorflow/downloads/tensorflow-2.12.0/third_party/repo.bzl:89:35: in INFO: Repository mkl_dnn_v1 instantiated at: C:/users/tensorflow/downloads/tensorflow-2.12.0/WORKSPACE:15:14: in C:/users/tensorflow/downloads/tensorflow-2.12.0/tensorflow/workspace2.bzl:967:21: in workspace C:/users/tensorflow/downloads/tensorflow-2.12.0/tensorflow/workspace2.bzl:188:20: in _tf_repositories C:/users/tensorflow/downloads/tensorflow-2.12.0/third_party/repo.bzl:136:21: in tf_http_archive Repository rule _tf_http_archive defined at: C:/users/tensorflow/downloads/tensorflow-2.12.0/third_party/repo.bzl:89:35: in ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted: INFO: Elapsed time: 221.760s INFO: 0 processes. FAILED: Build did NOT complete successfully (558 packages loaded, 28119 targets configured) currently loading: @jsoncpp_git// ... (3 packages) Fetching repository @local_config_git; starting 9s Fetching https://storage.googleapis.com/mirror.tensorflow.org/github.com/oneapi-src/oneDNN/archive/refs/tags/v2.7.3.tar.gz; 1.2 MiB (1,220,608B) Fetching https://storage.googleapis.com/mirror.tensorflow.org/www.sqlite.org/2022/sqlite-amalgamation-3400100.zip; 84.4 KiB (86,386B)

Is it solved in the latest 2.12 source code?

johnnkp avatar Apr 15 '23 12:04 johnnkp

I also think that is important. There is still many people using Windows as development environment.

AsakusaRinne avatar Apr 16 '23 09:04 AsakusaRinne

After switching back bazel to 5.3.0, I can run toolchain building until my disk has 7GB left and I stopped it. Don’t expect the build progress can be finished within a single bazel run. The server build script should run bazel command again when it failed (let say 10 times) to better handle errors like 404 not found and XXX is not defined. Also, there should have option to let people choose prebuilt LLVM path, so we don’t need to waste days of time and GBs of space to build LLVM from code.

Build environment: Windows 10 1909 Visual Studio 2022 (build tools 14.35) msys2-x86_64-20230318 CUDA 11.8 CUDNN 8.6.0 TensorRT 8.5.3

johnnkp avatar Apr 16 '23 12:04 johnnkp

While native GPU support on Windows will bring back the 5% perf increase and will support a few more users, please note that the total number of users of TF on Windows is very small compared to the other usecases and that there is almost no Windows expertise at Google to maintain this build. So, more than a year ago, it was decided to drop the Windows GPU support as the maintenance burden did not justify the costs, especially given that alternatives (i.e., WSL and using a Linux environment) exist.

I acknowledge that this might not be desired answer (and that previous answers from team did not treat the issue accordingly), but this is the most we can reliably do.

mihaimaruseac avatar Apr 18 '23 15:04 mihaimaruseac

@mihaimaruseac Is it still possible to build from source on Windows with cuda? I'm sad that I can't benefit from the most recent works of tensorflow and I'd like to build from source myself. However if that means I need to modify lots of cpp code, the price is too high.

AsakusaRinne avatar Apr 18 '23 15:04 AsakusaRinne

It should still be possible. There's also tensorflow/build repo and SIG Build where community can provide help for that, it's just that Google cannot effectively maintain the build itself.

mihaimaruseac avatar Apr 18 '23 15:04 mihaimaruseac

It should still be possible.

If you try to run bazel command again for several times (2 times in my case) without clean away the build progress, the errors related to XXX is not defined should be resolved.

@mihaimaruseac My suggestion of prebuilt LLVM path isn't related to CUDA. Is this improvement out of the ability of tensorflow team?

johnnkp avatar Apr 18 '23 16:04 johnnkp

I'd recommend opening a new issue for LLVM stuff, though I don't think that would change. Google synchronizes the LLVM repository to the internal monorepo very frequently and because TF does not have an interface to LLVM it needs to keep integrating LLVM at a similar frequency (see the "Integrate LLVM" commits at https://github.com/tensorflow/tensorflow/commits?author=tensorflower-gardener).

I have been suggesting that TF needs a stable interface for LLVM to reduce number of breakages caused by these integrates but to no avail

mihaimaruseac avatar Apr 18 '23 16:04 mihaimaruseac

While native GPU support on Windows will bring back the 5% perf increase and will support a few more users, please note that the total number of users of TF on Windows is very small compared to the other usecases and that there is almost no Windows expertise at Google to maintain this build. So, more than a year ago, it was decided to drop the Windows GPU support as the maintenance burden did not justify the costs, especially given that alternatives (i.e., WSL and using a Linux environment) exist.

I acknowledge that this might not be desired answer (and that previous answers from team did not treat the issue accordingly), but this is the most we can reliably do.

Finally an answer that does not simply refers to some links or repeats documentations..., even if it is not really satisfying. If the google tensorflow team is not able to support the windows build, it is hardly possible for a private person to do so. are there plans to reactivate native cuda windows support?

GatGit12 avatar May 08 '23 07:05 GatGit12

I agree with @GatGit12. Although it is cleared that it's still possible to build from source on windows with CUDA, I didn't find such a document or resource on SIG build. WSL2 is a choice, but far from an alternative... I think we can accept the support of only partial versions, for example, only v2.10, 2.13, 2.16 are supported with windows cuda, However all of us hope the tensorflow team to bring back support of native windows.

AsakusaRinne avatar May 08 '23 08:05 AsakusaRinne

We shouldn't expect much on tensorflow team. Before they willing to resume windows GPU build server for experimental build, nobody from the team can answer what actually tsl::int32 is for bug fixing. They can only reassign these issues to someone else and hope bot will close our issue soon.

johnnkp avatar May 10 '23 04:05 johnnkp

I agree with @GatGit12. Although it is cleared that it's still possible to build from source on windows with CUDA, I didn't find such a document or resource on SIG build. WSL2 is a choice, but far from an alternative... I think we can accept the support of only partial versions, for example, only v2.10, 2.13, 2.16 are supported with windows cuda, However all of us hope the tensorflow team to bring back support of native windows.

Unfortunately, this doesn't really work. Having a release build entails also having presubmit and postsubmit builds, otherwise the release process -- which already is delayed more than 3x compared to what used to be the case up to a year ago -- will be severly impacted as then the release engineer has to scramble and find people with Windows knowledge and fix all breakages.

But, having Windows + CUDA presubmit is what caused this decision to drop native CUDA support. There is no internal build at Google that uses Windows + CUDA, the number of developers that know the internals of MSVC + NVCC + CUDA that can also help fix issues in TF is very tiny. So development speed is severely impacted.

...nobody from the team can answer what actually tsl::int32 is for bug fixing....

https://github.com/google/tsl/blob/35680b213852096df805c16cfcc92946c2040f38/tsl/platform/default/integral_types.h#L28

From what I can see, the issue has only been moved from one TVC queue to another, but no one from the triage team has looked at it. Which is why you haven't seen even the default, keyword-based default answers.

mihaimaruseac avatar May 10 '23 13:05 mihaimaruseac

It should still be possible.

If you try to run bazel command again for several times (2 times in my case) without clean away the build progress, the errors related to XXX is not defined should be resolved.

@mihaimaruseac My suggestion of prebuilt LLVM path isn't related to CUDA. Is this improvement out of the ability of tensorflow team?

I also tried building TF2.12 and TF2.13-rc0 on Windows with native CUDA. First, one has to replace the hardcoded if is_windows() in configure.py forcing that its not going to build with CUDA on Windows. But when the build process finally runs i also get errors (LLVM, triton, and stuff...) and the build can't be completed. --> So its seems that even the ability to build the Windows native CUDA package by yourself has been destroyed, which is a shame. Going back to the last native Cuda version (2.10) that compiled correctly and worked, and digging through cpp files that broke during the development of tf2.11/2.12/2.13 is a lot of effort.

I really can't understand why TF is doing this, or in other words, has already taken this direction, since other equally large frameworks like JAX (https://jax.readthedocs.io/en/latest/developer.html#additional-notes-for-building-jaxlib-from-source-on-windows) and pytorch (https://github.com/pytorch/pytorch#from-source) both support a native CUDA Windows-based build.

In the meantime, issues keep popping up asking exactly for windows native cuda support, which shows that the need for this version is still there:

  • #https://github.com/tensorflow/tensorflow/issues/60172
  • #https://github.com/tensorflow/tensorflow/issues/60156
  • #https://github.com/tensorflow/tensorflow/issues/60241
  • #https://github.com/tensorflow/tensorflow/issues/60237

GatGit12 avatar May 22 '23 12:05 GatGit12

Just adding another issue in need for native windows cuda support:

  • #https://github.com/tensorflow/tensorflow/issues/60650

GatGit12 avatar Jun 02 '23 13:06 GatGit12

Please bring back native Windows CUDA support!!!!!!!!!

Sandy4321 avatar Jun 05 '23 18:06 Sandy4321

Another user that needs native cuda support on windows: https://github.com/tensorflow/tensorflow/issues/60830

GatGit12 avatar Jun 13 '23 07:06 GatGit12

I'm another user who would love to have this. Haven't logged an issue for this though. Although in my org we have already started on the workaround and are replacing tf with pytorch. Most of our models were transformers models so they would be easier to migrate. Still quite a lot of work.

Completely understand why it would not be possible to support this though. On a personal front this does forces me to move out of tf as well. I use PCs for most of my dev work and unfortunately wsl comes with its own issues at this moment.

linkwithkk avatar Jun 13 '23 16:06 linkwithkk

@mihaimaruseac While I understand supporting/maintaining GPU via CUDA natively on Windows is extra work, I also can't understand why Google can't do this (if PyTorch can do it...)

Anyway, concrete question: I've recently installed a new desktop Windows machine with an NVida RTX 4080 and hoped to be able to use it for more speedy work with TF2 / Keras (I did set it up in dual boot with Ubuntu, but prefer working in Windows normally). What are my options:

  1. use TF2 in WSL2 (does this work nicely with PyCharm as IDE?)
  2. use TF2 in native Windows, and use a DirectML plugin that can fully utilize my NVidia card (is this possible in a stable way now? more concrete info?)

KoenTanghe avatar Jun 22 '23 23:06 KoenTanghe

Someone using windows that is stuck to 2.10.1: https://github.com/tensorflow/tensorflow/issues/61178 Please also see: https://discuss.tensorflow.org/t/enable-windows-build-again/17806

GatGit12 avatar Jul 06 '23 08:07 GatGit12

90% of the reports on my project are about this https://github.com/melMass/comfy_mtb/issues?q=is%3Aissue+sort%3Aupdated-desc+label%3A%22install+issue%22+is%3Aclosed

For now I'm supporting only python 3.10 so even if not ideal it was kind of working, I'm currently looking into supporting 3.11 but there 2.10 doesn't have wheels:

> pip install tensorflow==2.10.1
ERROR: Could not find a version that satisfies the requirement tensorflow==2.10.1 (from versions: 2.12.0rc0, 2.12.0rc1, 2.12.0, 2.12.1, 2.13.0rc0, 2.13.0rc1, 2.13.0rc2, 2.13.0)
ERROR: No matching distribution found for tensorflow==2.10.1

melMass avatar Aug 14 '23 12:08 melMass

Building newest tensorflow is much harder than I think. If someone knows how to fix the following error, please create a pull request directly:

lld-link: error: undefined symbol: _mlir_ciface_XXX_GPU_DT_XXX_DT_XXX referenced by gpu_XXX_op.lo.lib(gpu_op_XXX.obj):(public: virtual struct tensorflow::UnrankedMemRef __cdecl tensorflow::`anonymous namespace'::MlirXXXGPUDT_XXXDT_XXXOp::I nvoke(class tensorflow::OpKernelContext *, class llvm::SmallVectorImpl struct tensorflow::UnrankedMemRef &))

lld-link: error: too many errors emitted, stopping now (use /errorlimit:0 to see all errors)

johnnkp avatar Aug 14 '23 12:08 johnnkp

@johnnkp i will need to look into it, any pointers to know about since I don't see mentions of build instructions for Windows?/

melMass avatar Aug 14 '23 14:08 melMass

I found out that bazel-out\x64_windows-opt\bin\tensorflow\python\pybind_symbol_target_libs_file.txt is missing a line of bazel-out/x64_windows-opt/bin/tensorflow/core/kernels/mlir_generated/base_op.lib. Then, bazel-out\x64_windows-opt\bin\tensorflow\python\pywrap_tensorflow_filtered_def_file.def is missing symbols of base_op and cause the above error.

I have moved to ARM mac and no ML project on my hand. You need to try it yourself.

johnnkp avatar Aug 14 '23 14:08 johnnkp