SGConv icon indicating copy to clipboard operation
SGConv copied to clipboard

2d filters

Open alexkreimer opened this issue 2 years ago • 11 comments

Hey, nice work!

I was wondering. It seems that in image tasks you convert the features to 1d and then apply the filter. Would it be possible to create 2d filters using the same idea? Did you try that?

alexkreimer avatar Oct 31 '22 19:10 alexkreimer

Hi Alex,

Thanks for your interest! It is a good idea to extend the SGConv to 2d filters, but we didn't try that because we focused on long sequence modeling in the paper, and using 2d filters violates the goal of the long-range benchmarks. I think you can definitely try to use the idea here on standard 2d filters to make better vision models.

ctlllll avatar Oct 31 '22 19:10 ctlllll

@ctlllll What is the longest sequence you are able to model using your method? Great work btw!

Tylersuard avatar Nov 03 '22 02:11 Tylersuard

@Tylersuard Hi Tyler, in the experiment we are able to test on Long Range Arena which has the Pathfinder-X with 128*128 flatten images and Speech Command dataset with 16000 as the sequence length.

leeyeehoo avatar Nov 03 '22 02:11 leeyeehoo

Great! I think the global conv filter is a brilliant idea. Hypothetically, what is the maximum sequence length you could do? Like would 200k (Enformer) or even 500m be possible?

Tylersuard avatar Nov 03 '22 02:11 Tylersuard

@Tylersuard Hi Tyler, we are unable to give the specific "maximum" length that SGConv can process. Usually if the task can be trained on the server with the enough GPU memory it's possible to have a try. Also thank you for informing the Enformer paper that I didn't notice before :)

leeyeehoo avatar Nov 03 '22 02:11 leeyeehoo

You are welcome! To me, the most exciting part about this is its ability to take really long input sequences with little additional compute cost. I tried to run your repo last night on a premium gpu using my custom dataset of many 200k character sequences, but it looks like you haven’t released the code. If you give me access, I will run some experiments and try to find the maximum input length and report my results to you.

Tylersuard avatar Nov 03 '22 18:11 Tylersuard

@Tylersuard Hi Tyler, we just pushed a standalone SGConv code, and you can have a try now! We tried to run on sequence with 1M tokens with model dimension 256, and it cost ~20G GPU memory per layer; I think it will work well in your case :) Thanks for pushing us to make things more approachable, got too lazy to clean the code before...

ctlllll avatar Nov 03 '22 19:11 ctlllll

@ctlllll Huzzah! Thank you very much.

Tylersuard avatar Nov 05 '22 01:11 Tylersuard

This might be a noob question, but how would I go about using this to generate text? Or maybe to solve one of the long document problems

Tylersuard avatar Nov 05 '22 04:11 Tylersuard

Hi Tyler, our language modeling experiments are based on the repository: https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/LanguageModeling/Transformer-XL You can refer to it for more information. It will give you the general idea of how to use transformer-style models to do language modeling.

leeyeehoo avatar Nov 06 '22 01:11 leeyeehoo

@ctlllll I was able to do 2 million tokens with model dimension 256 :) I am trying to get ahold of an 80gb A100 to push it even further. 2022-11-07 (4)

Tylersuard avatar Nov 09 '22 04:11 Tylersuard