Adrien B
Adrien B
I have kind of the same issue. On the line of code: `flat_params = self.f * flat_params - self.i * Variable(flat_grads)`, my computer take a lot of time (making the...
Nevermind that was not the problem, the problem was certainly version change in pytorch and so the operation: `flat_params = self.f * flat_params - self.i * Variable(flat_grads)` produce a 25450*25450...
Perfect ! I will try to be worthy maintainer ^^
You can try : model = Perceiver( input_channels = C, # number of channels for each token of the input input_axis = 1, etc) It will suppose you have only...
Is this really up to OA to do this ? it looks like OA has a big dependency to https://github.com/huggingface/text-generation-inference for the inference. If we want to have a proper...
@carmocca Yeah this is exactly what I did ! simple workaround
Do you think it will be usefull to gather data of OA failures too ? (scraping discord "bad-message-ids" ?)
I am doing some experiment on my own graph dataset. Your implementation seems to be more performant that the standard graph transformer (at least the one I tried from DGL...
@Reichenbachian I think there is currently a version of DPO under review on the TRL lib if you want to check : https://github.com/lvwerra/trl/pull/416/files#diff-5bbdb5d54108f2162b47bc54dc23c7b8e7744d2941118e60a44c161a4acc0ee8