Woosuk Kwon
Woosuk Kwon
`download_remote_dir` doesn't work. Printed error: ```python Traceback (most recent call last): File "test.py", line 3, in list(storage.stores.values())[0].download_remote_dir('') File "/Users/woosuk/workspace/sky-proj/sky/sky/data/storage.py", line 178, in download_remote_dir iterator = self._remote_filepath_iterator() AttributeError: 'S3Store' object has...
We've discussed using our local docker backend to resolve the difficulties in the setup, prior to provisioning (see #670 and this [gist](https://gist.github.com/concretevitamin/51e82f8b210ed7b905de20eb95bc8d6d)). As a preliminary investigation, I tried to use...
BLOOM is an open-source LLM developed by BigScience. The BLOOM models have achieved high rankings in HuggingFace downloads. It'd be great to have these models in our catalog.
Currently, pip installing our package takes 5-10 minutes because our CUDA kernels are compiled on the user machine. For better UX, we should include pre-built CUDA binaries in our PyPI...
As mentioned in https://github.com/WoosukKwon/cacheflow/pull/81#issuecomment-1546980281, the current PyTorch-based top-k and top-p implementation is memory-inefficient. This can be improved by introducing custom kernels.
Currently we call `torch.distributed.init_process_group` even for a single GPU. This is redundant and causes errors when the LLM object is created multiple times.
I failed to build the system with the latest NVIDIA PyTorch docker image. The reason is PyTorch installed by `pip` is built with CUDA 11.7 while the container uses CUDA...
We need tests for the models we support. The tests should ensure that the outputs of our models when using greedy sampling are equivalent to those of HF models.