Daya Khudia

Results 38 comments of Daya Khudia

> Is self-attention parallelizable with some code modification? It is but with code modifications. See https://pytorch.org/docs/stable/_modules/torch/distributed/tensor/parallel/multihead_attention_tp.html#TensorParallelMultiheadAttention

As you pointed out, StoryWriter has qkv_clip and currently doesn't work with FT. We have two options: 1) Adding clipping support in FT 2) Creating a fine-tuned version of StoryWriter...

top_k is 1 and that does greedy search. May be use top_k = 30 and play around with temperature.

@savemuri : Any reason why you have use_gpt_decoder_ops set to True? We haven't tried running through this path. Also could you try with output_len=256 with the converted model.

@mantrakp2004 : Inference doesn't have public endpoints. The only public way to interact with these model is thorough HF interface. For example, https://huggingface.co/spaces/mosaicml/mpt-30b-chat For private production scale usage, please get...

@nik-mosaic Could you take a look at this please?