Blake comments

Results 121 comments of


                                            Blake

How much target data?

I went ahead and used 3 minutes of training video. Results are decent. Not sure if more data would make a significant difference or not though.

Solves #260 by changing the Dockerfile to use a specified version of jaxlib

Need to have jax with Cuda for GPU support is my understanding. Pip installing does not support GPUs it seems and thus takes MUCH longer to generate any image. Whatever...

Solves #260 by changing the Dockerfile to use a specified version of jaxlib

Its worth noting that some forms of virtualization will have issues with jax due to not supporting certain instructions. I had some issues with that.

Incorrect block size?

The way the code currently works is that it creates blocks of text from the samples in the dataset that are N tokens long. It will create overflows in this...

Incorrect block size?

I have worked with adding padding rather than wrapping, may update the code in the near future to have that as an option.

A question more than an issue...

Thanks for watching my videos! My PC has changed over time. My CPU in all my videos has either been a Ryzen 3600 or Ryzen 5950x. I have both a...

[BUG] High VRAM Usage For Inference, Torch Dtype Doesn't Matter

By changing the pipeline to the following I now get VRAM usage of roughly 12GB per GPU. However, shouldn't the model be split over both GPUs and thus roughly 6GB...

[BUG] High VRAM Usage For Inference, Torch Dtype Doesn't Matter

> Hi @mallorbc , > > The problem is that the model selected from HF is Fp32 and it will load the checkpoint before coming to the model-partitioning on the...

AMD GPU not supported?

Instead of installing how the repo says, try installing the ROCM version from PyTorch website using pip. It may still not work but there is a chance it does. https://pytorch.org/get-started/locally/

[BUG][0.6.7] garbage output for multi-gpu with tutorial

I was also getting junk output following the tutorial. I can confirm that after building DeepSpeed from master that the issue seems resolved from GPT Neo 2.7B. I am however...