Universal-Transformer-Pytorch icon indicating copy to clipboard operation
Universal-Transformer-Pytorch copied to clipboard

Implementation of Universal Transformer in Pytorch

Results 10 Universal-Transformer-Pytorch issues
Sort by recently updated
recently updated
newest added

`state` passed to `fn` does not seem to be updated by ACT's masks, only `previous_state` ? https://github.com/andreamad8/Universal-Transformer-Pytorch/blob/master/models/UTransformer.py#L280 As such the dynamic halting seems to only kick in once all halting_probabilities...

[Here](https://github.com/andreamad8/Universal-Transformer-Pytorch/blob/e6b06375269e805a23acbb07ef1aa4d6402bce52/models/common_layer.py#L320) `i` is the index of `self.layers`, therefore it is always less than the length of `self.layers`. Probably you mean ``` if i < len(self.layers) - 1 ``` Then no...

ImportError: cannot import name 'babi' from 'torchtext.data.metrics' Name: torchtext Version: 0.10.0

Hi, I did notice you implement the function to calculate the position embedding. However, I found nowhere it was used. Can you please help me understand how you incorporate the...

Hi, I ran the experiments on the 10K setting, but my results are way worse than the reported ones. I didn't change any of the default parameters except from setting...

hi, when I run the model, I realize at first epoch it can reach max step 24, but start from second or third epoch, the probability by "p = self.sigma(self.p(state)).squeeze(-1)"...

I found that in models/UTransformer.py:110&194, you have the following codes: ``` self.proj_flag = False if(embedding_size == hidden_size): self.embedding_proj = nn.Linear(embedding_size, hidden_size, bias=False) self.proj_flag = True ``` I'm confused that you...

Hi, when running the script on a machine without cuda support, I'm getting the following error: > File ".../Universal-Transformer-Pytorch/models/UTransformer.py", line 236, in forward halting_probability = torch.zeros(inputs.shape[0],inputs.shape[1]).cuda() RuntimeError: torch.cuda.FloatTensor is not...

Hi, currently the `--task` argument is being ignored, due to line 153ff in `main.py`, so the script always runs all bAbi tasks in a row.

Hi, i found this implementation very interesting. I would like to understand more about Universal Transformer since i think this could allow much smaller LLMs with higher performance. p.s. i...