MagicSource
MagicSource
Thanks for the dedicated analysis. Do u think combine CAbstracter with Resampler instead using DeformableDecoderLayer would make Resampler single along get more informative resuts? For instances, from Vit output 576...
@NormXU Hi, I do have managed using Resampler achieved a promising result, comparable with MLP with extremly limited token. The result so far so good. But the real problem in...
Does CAbstactor able to scale? Resampler had a good feature that if you deeper it, it will keep growth the performance, just feed more data
> Resampler needs to learn locality from data (due to its less inductive bias), thus the dataset should be high-quality including spatial relationships and detailed descriptions. Do u have any...
@DachengLi1 I just expose max_position_embeeding as a param in monkey patch, not sure what happened, but if so, if it was overwritten, then my actually training are 4096? (not 2048...
Am also wondering for this. For instance, using v100 which might not possible feed 2048 at all, if using 1024 and applying condensing rotary embeddings in a ratio of 16,...
@DachengLi1 what I menas, v100 can not feed too much minimal len like 2048 for most cases.
@DachengLi1 hi, wanna discuss a bit more, have u tried compare with your method with ALibi on Extrapolation ability?
@SuperCrystal Hi, thanks for your investigation and walking around! I think you were right. it actually got wired loss this way. Would like send me a PR to fix this...
Am on 4.41 , still got ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run `pip install flash_attn` I can not use...