mistral-inference What is the `max_seq

What is the `max_seq_len` in Mistral?

Open ParadoxZW opened this issue 1 year ago • 1 comments

What is the max_seq_len (or max_position_embeddings) of Mistral-7B-v0.1 when training?

The official code says it is 128_000. (https://github.com/mistralai/mistral-src/blob/147c4e68279b90eb61b19bdea44e16f5539d5a5d/mistral/model.py#L201C69-L201C69)

The config file in huggingface says it is 32768. (https://huggingface.co/mistralai/Mistral-7B-v0.1/blob/main/config.json).

And the official blog mentions 16k.

Oct 23 '23 12:10 ParadoxZW

What is the max_seq_len (or max_position_embeddings) of Mistral-7B-v0.1 when training?

The official code says it is 128_000. (https://github.com/mistralai/mistral-src/blob/147c4e68279b90eb61b19bdea44e16f5539d5a5d/mistral/model.py#L201C69-L201C69)

The config file in huggingface says it is 32768. (https://huggingface.co/mistralai/Mistral-7B-v0.1/blob/main/config.json).

And the official blog mentions 16k.

And the paper claims an attention span of 131K tokens (Section 2 on "Architectural details" → "Sliding Window Attention").

Oct 27 '23 21:10 keyboardAnt

mistral-inference mistral-inference copied to clipboard

What is the `max_seq_len` in Mistral?

mistral-inference
mistral-inference copied to clipboard