ankit201
ankit201
> Could you please share the steps before this line? > > It seems like you are trying a different model in some manner, this is indicated by the difference...
newsela
@ojotoxy What is `test.en` and `test.sen` used? Did you use the pre-trained model, in that case, what was the torch version used?
@roeeaharoni `PID Killed` , where PID->process id is the error message
@roeeaharoni Please solve the issue mentioned by @008karan.
> It's supported on the "best effort basis". > > I started some work to actually support it, but it means rewriting flash attention (the cuda version) with added bias,...
> > on implementing dynamic batching for this as it only supports 1 concurrent request for now on AutoModel. > > This won't require work once we have flash attention....
> Here is the non flash version (as a temporary measure since modifying the kernel is taking more time than I anticipated: #514 > > This should enable sharding at...