Tests failing locally
Describe the issue as clearly as possible:
I created a new environment as per this page. And ran pytest to make sure it was all setup correctly. Two tests fail
FAILED tests/generate/test_integration_transfomers.py::test_transformers_integration_text - RuntimeError: index 512 is out of bounds for dimension 1 with size 512
FAILED tests/generate/test_samplers.py::test_multinomial - assert False
Steps/code to reproduce the bug:
Setup environment as per this page.
Then run:
pytest
Expected result:
No tests fail
Error message:
Test multinomial
def test_multinomial():
rng = torch.Generator()
rng.manual_seed(239)
logits = torch.tensor([[1.0, 4.0, 5.0]])
next_token_ids = multinomial(logits, 1, rng)
assert next_token_ids.equal(torch.tensor([[2]]))
next_token_ids = multinomial(logits, 2, rng)
assert next_token_ids.equal(torch.tensor([[2, 1]]))
logits = torch.tensor([[10.0, 0.0, 9.0], [-math.inf, 4.0, 5.0]])
next_token_ids = multinomial(logits, 1, rng)
> assert next_token_ids.equal(torch.tensor([[0], [1]]))
E assert False
E + where False = <built-in method equal of Tensor object at 0x28066b330>(tensor([[0],\n [1]]))
E + where <built-in method equal of Tensor object at 0x28066b330> = tensor([[0],\n [2]]).equal
E + and tensor([[0],\n [1]]) = <built-in method tensor of type object at 0x14cf71780>([[0], [1]])
E + where <built-in method tensor of type object at 0x14cf71780> = torch.tensor
tests/generate/test_samplers.py:37: AssertionError
Test Integration Transfomers
def test_transformers_integration_text():
rng = torch.Generator()
rng.manual_seed(10000) # Choosen so <EOS> is generated
model_name = "hf-internal-testing/tiny-random-GPTJForCausalLM"
model = models.transformers(model_name, device="cpu")
> sequence = generate.text(model)("Write a short sentence ", rng=rng)
tests/generate/test_integration_transfomers.py:72:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
outlines/generate/api.py:214: in __call__
last_state = next(states)
outlines/generate/generator.py:83: in sequence_generator
next_token_ids, kv_cache, logits, _ = token_generator(
../../../miniconda3/envs/outlines-dev/lib/python3.10/site-packages/torch/utils/_contextlib.py:115: in decorate_context
return func(*args, **kwargs)
outlines/generate/generator.py:137: in generate
logits, new_kv_cache = model(token_ids, attention_masks, kv_cache)
outlines/models/transformers.py:116: in __call__
logits, kv_cache = self.forward(input_ids, attention_mask, past_key_values)
../../../miniconda3/envs/outlines-dev/lib/python3.10/site-packages/torch/utils/_contextlib.py:115: in decorate_context
return func(*args, **kwargs)
outlines/models/transformers.py:99: in forward
output = self.model(
../../../miniconda3/envs/outlines-dev/lib/python3.10/site-packages/torch/nn/modules/module.py:1518: in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
../../../miniconda3/envs/outlines-dev/lib/python3.10/site-packages/torch/nn/modules/module.py:1527: in _call_impl
return forward_call(*args, **kwargs)
../../../miniconda3/envs/outlines-dev/lib/python3.10/site-packages/transformers/models/gptj/modeling_gptj.py:853: in forward
transformer_outputs = self.transformer(
../../../miniconda3/envs/outlines-dev/lib/python3.10/site-packages/torch/nn/modules/module.py:1518: in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
../../../miniconda3/envs/outlines-dev/lib/python3.10/site-packages/torch/nn/modules/module.py:1527: in _call_impl
return forward_call(*args, **kwargs)
../../../miniconda3/envs/outlines-dev/lib/python3.10/site-packages/transformers/models/gptj/modeling_gptj.py:679: in forward
outputs = block(
../../../miniconda3/envs/outlines-dev/lib/python3.10/site-packages/torch/nn/modules/module.py:1518: in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
../../../miniconda3/envs/outlines-dev/lib/python3.10/site-packages/torch/nn/modules/module.py:1527: in _call_impl
return forward_call(*args, **kwargs)
../../../miniconda3/envs/outlines-dev/lib/python3.10/site-packages/transformers/models/gptj/modeling_gptj.py:311: in forward
attn_outputs = self.attn(
../../../miniconda3/envs/outlines-dev/lib/python3.10/site-packages/torch/nn/modules/module.py:1518: in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
../../../miniconda3/envs/outlines-dev/lib/python3.10/site-packages/torch/nn/modules/module.py:1527: in _call_impl
return forward_call(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = GPTJAttention(
(attn_dropout): Dropout(p=0.0, inplace=False)
(resid_dropout): Dropout(p=0.0, inplace=False)
(k_p...Linear(in_features=32, out_features=32, bias=False)
(out_proj): Linear(in_features=32, out_features=32, bias=False)
)
hidden_states = tensor([[[ 0.6664, -0.2256, 2.1714, -1.4327, -0.7008, 0.6363, -0.1212,
0.5521, 2.3388, 0.6451, 1.2679,... 1.5293, -0.2082, 0.0675, -0.0417, -0.3105, -0.1146, -2.0392,
-1.3698, 1.0400, -0.8760, 0.2437]]])
layer_past = (tensor([[[[ 6.7909e-02, 4.5009e-03, 3.3863e-02, ..., 2.8171e-02,
1.2616e-01, 5.0846e-02],
...31e-01],
[-2.0773e-01, -7.4851e-02, 3.4538e-01, ..., -7.9283e-02,
-6.4052e-02, -2.8594e-01]]]]))
attention_mask = tensor([[[[-0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0...-0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0.,
-0., -0., -0., -0., -0., -0., -0.]]]])
position_ids = tensor([[512]]), head_mask = None, use_cache = True, output_attentions = False
def forward(
self,
hidden_states: torch.FloatTensor,
layer_past: Optional[Tuple[torch.Tensor]] = None,
attention_mask: Optional[torch.FloatTensor] = None,
position_ids: Optional[torch.LongTensor] = None,
head_mask: Optional[torch.FloatTensor] = None,
use_cache: Optional[bool] = False,
output_attentions: Optional[bool] = False,
) -> Union[
Tuple[torch.Tensor, Tuple[torch.Tensor]],
Optional[Tuple[torch.Tensor, Tuple[torch.Tensor], Tuple[torch.Tensor, ...]]],
]:
query = self.q_proj(hidden_states)
key = self.k_proj(hidden_states)
value = self.v_proj(hidden_states)
query = self._split_heads(query, self.num_attention_heads, self.head_dim, True)
key = self._split_heads(key, self.num_attention_heads, self.head_dim, True)
value = self._split_heads(value, self.num_attention_heads, self.head_dim, False)
if is_torch_fx_proxy(position_ids) or torch.jit.is_tracing():
# The logic to conditionally copy to GPU could not be traced, so we do this
# every time in the torch.fx case
embed_positions = get_embed_positions(self.embed_positions, position_ids)
else:
embed_positions = self._get_embed_positions(position_ids)
repeated_position_ids = position_ids.unsqueeze(-1).repeat(1, 1, embed_positions.shape[-1])
> sincos = torch.gather(embed_positions, 1, repeated_position_ids)
E RuntimeError: index 512 is out of bounds for dimension 1 with size 512
../../../miniconda3/envs/outlines-dev/lib/python3.10/site-packages/transformers/models/gptj/modeling_gptj.py:223: RuntimeError
Outlines/Python version information:
❯ conda list outlin
# packages in environment at /Users/sid.ravinutala/miniconda3/envs/outlines-dev:
#
# Name Version Build Channel
outlines 0.1.dev413+g298a080 pypi_0 pypi
(outlines-dev)
>>> import sys; print("Python", sys.version)
Python 3.10.0 | packaged by conda-forge | (default, Nov 20 2021, 02:27:15) [Clang 11.1.0 ]
Context for the issue:
Hoping to setup my local environment so I can contribute to the project :)
Can you provide all the results of conda/pip list?
Sure!
Package Version Editable project location
------------------------- ------------------- -------------------------------------------------
accelerate 0.25.0
aiohttp 3.9.1
aiosignal 1.3.1
annotated-types 0.6.0
anyio 4.2.0
asttokens 2.4.1
async-timeout 4.0.3
attrs 23.1.0
beartype 0.15.0
Brotli 1.1.0
certifi 2023.11.17
cffi 1.16.0
cfgv 3.3.1
chardet 5.2.0
charset-normalizer 3.3.2
cloudpickle 3.0.0
colorama 0.4.6
coverage 7.4.0
datasets 2.16.0
diff_cover 8.0.2
dill 0.3.7
distlib 0.3.8
distro 1.9.0
exceptiongroup 1.2.0
filelock 3.13.1
frozenlist 1.4.1
fsspec 2023.10.0
h11 0.14.0
httpcore 1.0.2
httpx 0.26.0
huggingface-hub 0.20.0
icontract 2.6.6
identify 2.5.33
idna 3.6
importlib-metadata 7.0.1
importlib-resources 6.1.1
iniconfig 2.0.0
interegular 0.3.2
Jinja2 3.1.2
joblib 1.3.2
jsonschema 4.20.0
jsonschema-specifications 2023.11.2
lark 1.1.8
llvmlite 0.41.1
MarkupSafe 2.1.3
mpmath 1.3.0
msgpack 1.0.4
multidict 6.0.4
multiprocess 0.70.15
nest-asyncio 1.5.8
networkx 3.2.1
nodeenv 1.8.0
numba 0.58.1
numpy 1.26.2
openai 1.6.1
outlines 0.1.dev413+g298a080 /Users/sid.ravinutala/Documents/Projects/outlines
packaging 23.2
pandas 2.1.4
perscache 0.6.1
pip 23.3.2
pkgutil_resolve_name 1.3.10
platformdirs 4.1.0
pluggy 1.3.0
pre-commit 3.6.0
psutil 5.9.7
pyarrow 14.0.2
pyarrow-hotfix 0.6
pycparser 2.21
pydantic 2.5.3
pydantic_core 2.14.6
Pygments 2.17.2
PySocks 1.7.1
pytest 7.4.3
pytest-cov 4.1.0
python-dateutil 2.8.2
pytz 2023.3.post1
PyYAML 6.0.1
referencing 0.32.0
regex 2023.12.25
requests 2.31.0
responses 0.24.1
rpds-py 0.15.2
safetensors 0.3.3
SciPy 1.11.4
setuptools 68.2.2
six 1.16.0
sniffio 1.3.0
sympy 1.12
tiktoken 0.5.2
tokenizers 0.15.0
tomli 2.0.1
torch 2.1.2
tqdm 4.66.1
transformers 4.36.2
typing_extensions 4.9.0
tzdata 2023.3
ukkonen 1.0.1
urllib3 2.1.0
virtualenv 20.25.0
wheel 0.42.0
xxhash 3.4.1
yarl 1.9.3
zipp 3.17.0
(outlines-dev)
We managed to reproduce the error internally and will hopefully soon come up with a fix.
Immediately prior to the tests failure, in the call
output = self.model(
input_ids,
attention_mask=attention_mask,
return_dict=True,
output_attentions=False,
output_hidden_states=False,
past_key_values=past_key_values,
)
past_key_values is a tuple of 5, each being a tensor of shape torch.Size([1, 4, 512, 8]).
The 512 dimension increments by 1 each call.
The model is limited to 512 tokens https://huggingface.co/hf-internal-testing/tiny-random-GPTJForCausalLM/blob/main/config.json#L23
It's probably failing locally, but passing in CI because of https://pytorch.org/docs/stable/notes/randomness.html#reproducibility
The token count extending beyond n_positions ungracefully resulting in a torch RuntimeError is also problem which should be addressed.
This is caused by RNGs being inconsistent across machines and the use of a randomly initialized model (https://huggingface.co/hf-internal-testing/tiny-random-GPTJForCausalLM). The eos token is never generated and we exceed the 512 token limit for the model.
I ran into this issue again when working on https://github.com/outlines-dev/outlines/pull/966 and I'll include a fix in that PR. We must ensure that either the greedy sampler is used, or max_tokens is specified in tests because we cannot rely on consistent RNGs.