lance Using lance with PyTorch dataloaders

Hello,

I am looking at lance for a pytorch dataloader. I am having issues with a lance based loader (like this one https://lancedb.github.io/lance/examples/llm_training.html) when using it in a distributed setting. Two questions: 1 - was the example provided ever tested under a distributed setting (multi-gpu)? 2 - has anyone got it to work in a distributed setting (multi-gpu)? I am using torchrun to launch the training job. An almost identical loader works with in memory csv. It seems to hang at the point that lance is instantiated ds = lance.dataset(input_filename) Finally, is there some other way I should be using lance for data loading?

Thanks

Apr 16 '24 08:04 jn2clark

@jn2clark This example is only for single GPU training; however, we are working on multi-GPU dataloader support.

We will investigate your issue of the dataloader hanging at dataset instantiation and get back to you. Thanks a lot for reporting it!

Also, we now have a dedicated repository for Deep learning recipes using Lance: https://github.com/lancedb/lance-deeplearning-recipes

Apr 16 '24 08:04 tanaymeh

Thanks @tanaymeh, that would be great.

Apr 16 '24 11:04 jn2clark

@jn2clark could you set spawn method

from multiprocessing import set_start_method
set_start_method("spawn")

Before running Pytorch loader?

Apr 24 '24 23:04 eddyxu

@jn2clark could you set spawn method
from multiprocessing import set_start_method
set_start_method("spawn")
Before running Pytorch loader?

to add some color here.

pytorch dataloader uses multiprocessing and python forks processed by default.

CUDAContext in pytorch is not compatible with fork so we need to use spawn in multi-gpu DDP (distributed data parallel) env. see torch doc here

Additionally, lance doesn't fork well either.

Apr 24 '24 23:04 chebbyChefNEQ

Thanks for the suggestions. I found an alternative but would like to try and get this to work at least for a benchmark to compare. I can try next week

Jun 07 '24 09:06 jn2clark

@jn2clark Any updates? I'm also looking for ways to use multi-GPU.

Jul 29 '24 09:07 baorepo

@jn2clark @baorepo I added an example to Lance Deep learning recipes about training GPT-2 using FSDP strategy. It might be useful: https://github.com/lancedb/lance-deeplearning-recipes/tree/main/examples/fsdp-llm-pretraining

Jul 29 '24 09:07 tanaymeh

lance lance copied to clipboard

Using lance with PyTorch dataloaders

lance
lance copied to clipboard