SDV icon indicating copy to clipboard operation
SDV copied to clipboard

Support optional dependencies for CUDA related packages (torch, etc.)

Open matanitah opened this issue 1 year ago • 8 comments

Problem Description

"pip install sdv" has a lot of dependencies related to CUDA and Torch that should be optional (like "pip install sdv[no-nvidia]" for example). This would make it a lot easier for organizations to get the Community version running in their own VPCs/images.

Expected behavior

"pip install sdv[no-nvidia]" should download a version of SDV which is CPU only and which does not use CUDA or require all of the NVIDIA libraries.

Additional context

This will make it easier for developers inside companies to demonstrate the value of SDV to their managers,

matanitah avatar Aug 06 '24 15:08 matanitah

Hi there @matanitah unfortunately pip doesn't have a way to explicitly exclude specific dependencies (only a way to explicitly include dependencies). So this means we'd have to slim down the dependencies in base SDV (the most common starting point & install path) quite a bit and add CUDA-based packages as a set of optional dependencies. This approach has it's own tradeoffs as well!

To help us better understand, do you mind sharing more context on the barriers you're encountering when trying to "get the Community version running in [your] own VPCs/images" ?

srinify avatar Aug 06 '24 19:08 srinify

Sure! The NVIDIA libraries increases the size of the image we have to run quite dramatically, and its really only relevant if we decide to run our SDV code on GPUs, which in our case we have opted to go with CPU anyway. I think having an option like: "pip install sdv[gpu]" would be beneficial because it would keep the size of the image needed to run SDV community code light for those who only want to use CPU, which makes EC2 load times faster and makes it easier for us to keep the cost of infrastructure low.

matanitah avatar Aug 06 '24 20:08 matanitah

Hi @matanitah It’s great to see your interest in the SDV ecosystem. This comment is a reminder to consult your legal team before adopting the SDV into your project, as SDV has a source-available license.

For more information, you can read through our license FAQs (not legal advice). For any other questions, you can Contact Us. You can also inquire about a commercial license to allow additional use.

sdv-team avatar Aug 06 '24 21:08 sdv-team

That makes sense @matanitah thanks for sharing more context! I'll leave this issue open as a feature request for the team :)

This is similar to this other feature request as well: https://github.com/sdv-dev/SDV/issues/1621

srinify avatar Aug 06 '24 21:08 srinify

Seconding this issue - makes it very cumbersome to deploy SDV, even if only a reduced feature set is used.

spreeni avatar Feb 11 '25 11:02 spreeni

Thanks @spreeni. We are still evaluating within the team. Python unfortunately does not allow you to remove dependencies, or else we would love do something like pip install sdv[no-cuda]. We are still evaluating the pros/cons of changing the default dependencies listed for pip install sdv, which have been established for many years now.

In the meantime, would you be able to describe more about how it's affecting your deployment -- for eg. is it increasing installation time, using up more memory, or is it something else? Any details you can share about where you're deploying, how often you are installing, etc. will be very helpful for us to make the case. Thank you.

npatki avatar Feb 26 '25 01:02 npatki

Hey @npatki, I have used SDV within a Docker container that I push to a Gitlab container registry, from where it is then pulled by other deployment services. For me the issues are the following

  • package size - it takes very long to deploy this to the registry (although less so on subsequent pushes due to delta updates), and I am not sure how lazily the imports are handled, but it could also increase memory demand running in any deployment
  • installation time - it takes quite a while to install updates if changes in the library occur. This also makes it less probable for quick demos and tests, where you may not need the accuracy of sophisticated deep learning models
  • dependency bloat - it it just not nice to carry a lot of dependencies with you that you don't use (e.g. GPU-enabled torch, plotly, boto3). Here, an opt-in process would be nice.

In the following minimal example, the library adds 6GB of dependencies to my Docker image.

FROM --platform=linux/x86_64 python:3.12-slim-bookworm

RUN python -m venv /venv
ENV PATH="/venv/bin:$PATH"

RUN pip install --no-cache-dir --default-timeout=300 sdv
Image

spreeni avatar Mar 07 '25 10:03 spreeni

Hello @matanitah and @spreeni, are you still working with SDV?

The good news is that starting from today's release (SDV v1.23.0), you should be able to use import SDV even if you don't have torch installed. You should also be able to use any SDV synthesizer that does not require torch (eg. GaussianCopulaSynthesizer).

# GaussianCopulaSynthesizer works
>>> from sdv.single_table import GaussianCopulaSynthesizer
>>> synthesizer = GaussianCopulaSynthesizer(metadata)
>>> synthesizer.fit(data)
>>> synthetic_data = synthesizer.sample(num_rows=100)

# Other synthesizers that require torch will not work
>>> from sdv.single_table import CTGANSynthesizer
>>> synthesizer = CTGANSynthesizer(metadata)
ModuleNotFoundError: No module named 'torch'. Please install torch in order to use the 'CTGANSynthesizer'.

Note that this SDV still lists torch as a dependency because we do believe that CTGAN, TVAE, etc. are important components of the SDV package. But when setting up your environment, you can bypass this if you need by:

  1. Install SDV, and then uninstalling torch OR
  2. Installing packages only from a pre-defined requirements.txt I tested this out on this requirements.txt.
pip install sdv --no-deps --requirement requirements.txt 

In either option, you will end up with SDV installed without torch, which should hopefully unblock you from your project.

Let us know if this doesn't work or if you have any other feedback around this. Thanks.

npatki avatar Jun 18 '25 00:06 npatki

I'm closing off this issue since we are now supporting the ability to use SDV features without having CUDA or torch installed -- #2551.

We have made the decision to still list torch as a dependency of SDV, since CTGAN, TVAE, etc. are important, much-used features in the SDV package. However, you should still be able to set up your environment following the code above.

Please feel free to reply if there is more to discuss -- as I can always re-open the issue for further investigation. Or alternatively, file a new issue for new requests/questions. Thanks all!

npatki avatar Jul 23 '25 18:07 npatki