model icon indicating copy to clipboard operation
model copied to clipboard

Learning to find similar semantics from embeddings

Open brunosan opened this issue 11 months ago • 9 comments

The main practical use case of Clay as of now, and the center of the upcoming app, is the ability to find similar features. Think: 1) Click on a pool, 2) find more potential examples, 3) confirm/reject candidates, 4) iterate until you are happy.

The current chip size 512 pixels or ~5120 m in Sentinel is much larger than most semantics, or even the patch size 32 pixels or ~320 meters so we corresponding embeddings will incorporate the many semantics present on the chip/patch. This multi-semantics will lead to similarity search (e.g. cosine) or other tools of limited use, since this looks at all dimensions.

I believe we need a way to both:

  1. pick the subset, or function, of dimensions that best represent the requested sample of examples.
  2. locate them in the image.

This might take the shape of a "decoder" that is either plug to the encoder or, better, take embeddings as input. Ideally, this decoder is agnostic of the label, or location, and needs no trainning on inference time (so that the app can use it easily).

  • @MaceGrim has been working on this, so please update here with your takeaway.
  • I'll attempt to use RF to calculate "feature importance" and identify the subset of dimensions to improve the similarity search.

cc @yellowcap, @geohacker and @srmsoumya for ideas.

brunosan avatar Mar 19 '24 18:03 brunosan