model
model copied to clipboard
Learning to find similar semantics from embeddings
The main practical use case of Clay as of now, and the center of the upcoming app, is the ability to find similar features. Think: 1) Click on a pool, 2) find more potential examples, 3) confirm/reject candidates, 4) iterate until you are happy.
The current chip size 512 pixels
or ~5120 m in Sentinel
is much larger than most semantics, or even the patch size 32 pixels
or ~320 meters
so we corresponding embeddings will incorporate the many semantics present on the chip/patch. This multi-semantics will lead to similarity search (e.g. cosine) or other tools of limited use, since this looks at all dimensions.
I believe we need a way to both:
- pick the subset, or function, of dimensions that best represent the requested sample of examples.
- locate them in the image.
This might take the shape of a "decoder" that is either plug to the encoder or, better, take embeddings as input. Ideally, this decoder is agnostic of the label, or location, and needs no trainning on inference time (so that the app can use it easily).
- @MaceGrim has been working on this, so please update here with your takeaway.
- I'll attempt to use RF to calculate "feature importance" and identify the subset of dimensions to improve the similarity search.
cc @yellowcap, @geohacker and @srmsoumya for ideas.