NEKO issues

Better concatenation and individual metrics when using multiple text datasets

5

For text task, when we would have multiple datasets, concatenation strategy could be moved to a more sophisticated logic by using huggingface concatenation. Further, we may wish to change the...

bhavul

enhancement

good first issue

Updated Resource Analysis

building on #40, capture: - original analysis document - updated analysis of storage and compute requirements - process across manifold for managing resources (can be project driven or org driven)

harshsikka

v0 NEKO Model & System Designs

1

harshsikka

documentation

Context: We currently know what datasets we are using, but we are trying to centralize this knowledge into the survery [spreadsheet](https://docs.google.com/spreadsheets/d/1bvoS75q101-uUYBiOWZRZvPDlYM8EegaItjkFkZrUwQ/edit?usp=sharing). Output: Redact and know what datasets we are currently...

snat-s

Profiling existing models

Context: We want to profile existing multimodal models we picked in #62 and evaluate them on the benchmark we are proposing. Output: Reported model performance on benchmark

harshsikka

Pick models to profile for the datasets

harshsikka

Outline Control Modality Next Steps across Data, Training

3

@jsjung00 is catching up to context from control modality work w/ help from @daniellawson9999. We want to rapidly understand what next datasets are needed, as we are finalizing the v0...

harshsikka

Outline Language & Vision Modality next steps

2

Similar to #59 we want to understand next steps (both brief context on small level issues (i.e. bugs etc) and next major steps

harshsikka

Port BabyAI GoToLocal Expert Trajectories To Minari

4

As mentioned in [Source MiniGrid/BabyAI Dataset#14](https://github.com/ManifoldRG/NEKO/issues/14), once a dataset is sourced, it needs to be converted to Minari. GoToLocal expert trajectories have already been uploaded to [google drive](https://drive.google.com/drive/u/0/folders/1630PKrrNtVKAzd5rF_mabsDasVe2hrRc). Now it...

helenlu66

Investigate Audio Modality Datasets

5

Audio may be a more generally useful modality to train a large multimodal model on. We want to understand what datasets are available for this modality. Outcome: Dataset Investigation Analysis,...

harshsikka

NEKO
NEKO copied to clipboard

Metadata

Better concatenation and individual metrics when using multiple text datasets

Updated Resource Analysis

v0 NEKO Model & System Designs

Investigate control data

Profiling existing models

Pick models to profile for the datasets

Outline Control Modality Next Steps across Data, Training

Outline Language & Vision Modality next steps

Port BabyAI GoToLocal Expert Trajectories To Minari

Investigate Audio Modality Datasets

← Metadata

Owner

Metadata

NEKO NEKO copied to clipboard

Metadata

← Metadata

Owner

Metadata

NEKO
NEKO copied to clipboard