Matt Watson

Results 339 comments of Matt Watson

Just because they are exposing these parameters does not mean we need to as well. We do need compatibility of our forward pass and their forward pass. But we don't...

Agreed we need to handle the differences between XMLR base and XL (it's annoying that their architecture changes between what is supposed to be difference sizes). But maybe let's start...

Triage notes: we took a look and think that we should actually have concatenate throw an error with only a single input. We will try out a change and see...

Looking at the code here, it seems like you have a `mask_dict` which is static in the context of an individual `model.fit()` call. Is that right? If that is the...

Took a look at the benchmark. Is looks like `timeseries_dataset_from_array2`, which I guessing is the new version, is actually performing noticeably worse. Is that correct? We will probably not be...

@AakashKumarNain sorry about the breakage here! For the original question above... would it work to replace the `token_embedding` with a [keras_nlp.layers. ReversibleEmbedding](https://keras.io/api/keras_nlp/modeling_layers/reversible_embedding/), and pass that token embedding to the masked...

For some general context, we did decide to move away from the old way of passing embedding weights to the `MaskedLMHead` because it would not save correctly with the upcoming...

Sounds good keep us posted! And yeah the `TokenAndPositionEmbedding` initialization is an interesting question. I don't think there is any way to guarantee stable training performance with arbitrary architectures and...

@innat can you say more about what you had in mind? Something that goes directly `keras.ops` to the onnx format? And how would that compare to going to onnx through...

What are some example input and outputs with your change? I would think if we wanted targets to be a sequence to we would want to add a parameter along...