model icon indicating copy to clipboard operation
model copied to clipboard

Embedding factory script looping through each MGRS tile

Open weiji14 opened this issue 1 year ago • 0 comments

Jupyter notebook script to generate GeoParquet embedding files on a per MGRS tile basis.

Steps:

  1. The script first generates an mgrs_world.txt file with a list of MGRS code names like 12ABC. Need to run this command first:
    aws s3 ls s3://clay-tiles-02/02/ | tr -s ' ' |  cut -d ' ' -f 3 | cut -d '/' -f 1 > mgrs_world.txt
    
  2. A for-loop then goes through each MGRS tile, with the model running the prediction to generate GeoParquet files that are uploaded to s3.

Notes:

  • There were about 947019 rows of embeddings generated from the clay-small-70MT-1100T-10E.ckpt model checkpoint in Dec 2023.
  • Embeddings were generated using a g5.4xlarge EC2 instance with 1 NVIDIA A10G GPU that allows for bfloat16 dtype calculations.

Closes #120

weiji14 avatar Jan 17 '24 03:01 weiji14