Wilson Yan comments

Results 26 comments of


                                            Wilson Yan

Different runtimes for `pmap` vs `pjit`

If I remove the model parallelism component in the `pjit` portion of the code, i.e. only `['dp']` ```python mesh = Mesh(np.asarray(jax.devices(), dtype=object).reshape(jax.local_device_count(),), ['dp']) jax.experimental.maps.thread_resources.env = ( jax.experimental.maps.ResourceEnv(physical_mesh=mesh, loops=()) ) p_step...

Different runtimes for `pmap` vs `pjit`

I tried running similar code on some larger models, and get similar effects. This is also done **with only data parallelism (no model axis in mesh)** Code is run on...

Different runtimes for `pmap` vs `pjit`

Thanks for looking into it! Is there a good way to prevent this issue from happening code-wise? i.e. differently coding the architecture or enforcing certain constraints to help the partitioner...

Please provide example uses of the scripts

Sorry about that, I'll spend some time this coming weekend to write some more descriptions. I can also include the dataset generation script. In general, it's just downloading [pg19](https://huggingface.co/datasets/pg19) and...

LWM-Chat in PyTorch

Hi, thanks for your interest! A pytorch version is on the roadmap, but may take a some time since both of us are rather occupied with other things at the...

Memory requirements

If using vLLM for inference (PyTorch model, FP16), I believe we used: - 1 80GB A100 for 32K - 2 80GB A100s for 128K - 4 80GB A100s for 256K...

vision chat error

The `mesh_dim` argument depends on the number of devices you're using for inference. If you want to do tensor parallelism over 8 gpus, then `mesh_dim` should be `1,1,8,1`. The default...

Great work! Any plan to train a smaller version, e.g. around 3B?

Thanks for your interest. We don't have plans to train a smaller model at the moment

Why always use float32 precision in training?

In general, we didn't run into too much memory bottlenecks for our needs, so we primarily just stuck with `fp32` to be safe (proper mixed precision training with `bf16` requires...

out of memory error

I don't think your GPU has enough memory, as by itself a 7B model with `fp32` would be 28GB.