Kyle Gorman comments

Results 217 comments of


                                            Kyle Gorman

trafficstars

Pointer-generator crashes when source and target are disjoint

Thanks for looking into this @Adamits. (Sorry @Othergreengrasses I wasn't in front of a computer with a GPU, so I was just looking at the CLI arguments.) Thanks for finding...

Pointer-generator crashes when source and target are disjoint

> Hey y'all, asked for this error to be put up. I think the main issue is this line: > > > p_gen += self.W_emb(target_embeddings) + self.bias.expand(... > > Pytorch...

Pointer-generator crashes when source and target are disjoint

My $.02 is that when they work, in-place operations are honkin' great, but sometimes they have a weird effect on the computation graph.

Subword tokenization

I am weakly opposed. It is a big source of complexity in FairSeq and we don't have any reason to suppose it improves things on this task. (That said, fork...

Subword tokenization

`examples` is the wild west, do what you will there, within reason ;)

Model support: GPT

Dumb question but how is this different than the type of decoder-only LM we were talking about?

Model support: GPT

I think Adam has an implementation in his fork, but hasn’t PRed it yet. On Tue, Feb 6, 2024 at 12:06 PM Travis Bartley ***@***.***> wrote: > It's exactly that....

Add self attention encoder

+1. Makes sense.

Add caching for transformer inference

So this is an approximation/hack, right? I'm fine with it, and maybe we could treat it as a separate architecture to keep things simple.

Add caching for transformer inference

> > So this is an approximation/hack, right? > > I don't think so. I think e.g. the attention of t wrt t-1 will always be the same. So caching...