Sidd Karamcheti

Results 23 issues of Sidd Karamcheti

This is an awesome project! One small enhancement - it would be really nice to support "configurable" population orders for help text beyond the default (look for docstring above argument,...

Updated venues for SIGGRAPH, add new cites. Leaving as a draft PR until #34 is merged.

Additionally fixes TMLR venue formatting (typo and lack of a volume number, as it's a rolling publication cycle).

Thanks for open-sourcing the SigLIP models! Clarification question: in the demo IPython notebook, the image transform function has the form `pp_img = pp_builder.get_preprocess_fn(f'resize({RES})|value_range(-1, 1)')`. Looking at the code [here](https://github.com/google-research/big_vision/blob/main/big_vision/pp/ops_image.py#L64), this...

It's pretty inconvenient to have users manually download the various datasets (especially the bigger ones such as OCID-Ref and the Franka/Adroit Demonstrations); having a nice interface/hosting solution would be nice....

enhancement

Once finalized, would be nice to treat this as a modular, general library akin to the main [Voltron Robotics](https://github.com/siddk/voltron-robotics) package. Would ideally be nice so that `pip` handled everything (all...

enhancement
question

Once the Visuomotor Control & Intent Scoring tasks have been integrated properly, would be nice to consolidate the V-Evaluation Harness into a more general API that can be used for...

documentation
refactor
roadmap

Check with Shikhar + co. around sharing the video examples from the [WHiRL website](https://human2robot.github.io/) - possibly get more examples. Then: - [ ] Upload videos to GDrive add simple `gdown`...

roadmap

Currently, both the Franka Kitchen & Adroit Visuomotor Control Tasks are implemented on top of the [R3M evaluation code](https://github.com/facebookresearch/r3m/tree/eval/evaluation). While a fairly straightforward addition to that codebase, it would be...

enhancement
roadmap

### Model description Hi! I'm the author of ["Prismatic VLMs"](https://github.com/TRI-ML/prismatic-vlms), our upcoming ICML paper that introduces and ablates design choices of visually-conditioned language models that are similar to LLaVa or...

New model