Tian Lan
Tian Lan
Hi, thank you for your interest and your question. WarpDrive provides a multi-agent RL development framework that allows both environment rollout and training to happen in-place at GPU. We provide...
The original motivation of WarpDrive is trying to fully utilize the GPU resource while reducing its talk to CPU host as much as possible. I think WarpDrive is doing a...
We have CUDA reset function that pushes the data from CPU host to the GPU in the very beginning and later used this original host start point to reset directly...
@MBasalla thank you for the suggestion. We used deterministic tree models for the multivariate forecaster, and we are considering to add some probabilistic-based tree models and others for the multivariate...
@UsaidPro Thank you for fixing this. I have approved your PR. Regarding this CLA, could you please refresh and check if it works? I am not sure why it is...
@anjanashankar9 Can you complete the salesforce CLA so we can merge your PR?
Cool, I am glad that we have similar concerns :) BTW, I tried Cython, it has the way to release the GIL so it can provide a real multi-threading env,...
I saw the same error even running the demo script of model Pythia28, using 8 A100 40GB at Google Cloud `python -u train.py model=pythia28 datasets=[hh] loss=sft exp_name=anthropic_dpo_pythia28 gradient_accumulation_steps=2 batch_size=64 eval_batch_size=32...
@abaheti95 thank you for your work. I think Huggingface has published a DPO trainer with QLora, https://github.com/huggingface/trl/blob/main/trl/trainer/dpo_trainer.py By any chance, could you please comment on any obvious difference? If you...
Awesome work, I will check on this to see how it works soon! BTW, besides the obvious speed issues, I also noticed there is discussions on how well DPO converges,...