mylesgoose issues

Results 7 issues of


                                            mylesgoose

train.py

I had to vary this code here in the Train.py to get it to work on my system # LoRA and DORA modules sys.path.append("./scripts") from scripts.lora import LORA from scripts.dora...

train time decrease from 13 hours to 9

Hello. I built a conda environemtn with these settings: `name: xtuner channels: - nvidia/label/cuda-12.4.0 - pytorch - conda-forge dependencies: - python=3.11 # Specify Python version here - pytorch - torchvision...

Checkpoint feature via steps instead of epoch

### 🚀 The feature, motivation and pitch at the moment the scritp only saves via epoch. for large data sets this is quite bad. ### Alternatives i crated an alternative...

Llama 3.2

# What does this PR do? Add new checkpoint converter for the vision models Fixes # (issue) ## Feature/Issue validation/testing Please describe the tests that you ran to verify your...

cla signed

3 pytorch allocator cache flushes since last step

consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time Cache cleared {'loss': 1.0039, 'grad_norm': 4.742300987243652, 'learning_rate': 2.7891156462585034e-06, 'epoch': 0.01}...

Increase in training speed, pip list

Here is my pip list of items installed which took the training time from 24 hours on 6 rtx 4090 to 13 hours. I used the newer versions of all...

Torch 2.6 Amd CPU NVIDIA GPU

### 🐛 Describe the bug When trying to compile the lattest torch version on my amd system. the compilation failed. In order for it to compile i had to modify...

triage review