earth-forecasting-transformer icon indicating copy to clipboard operation
earth-forecasting-transformer copied to clipboard

How to run my own datasetwithin Earthformer

Open fizzking opened this issue 1 year ago • 11 comments

I want to use Earthformer to train my own dataset and test it, what format should I process the data into and what py files should I prepare?

fizzking avatar Dec 08 '23 02:12 fizzking

Thanks for your question. You may want to refer to the simplest test case to verify if the shapes are aligned correctly. Please note that this test script is from my fork, which has not been merged into this repo.

gaozhihan avatar Dec 10 '23 00:12 gaozhihan

I ran the command in readme: python3 -m pytest. The result is the following error, what does it mean? 5a32edff5a9fb4f0ec51f9a6b88ca1c

fizzking avatar Dec 10 '23 13:12 fizzking

I'm not sure if you are using the correct script in my fork, but you don't need to run pytest. Please simply try python ROOT_DIR/tests/test_cuboid.py.

gaozhihan avatar Dec 10 '23 16:12 gaozhihan

I run the test code according to what you said, and the result shows that the model lacks parameters. How to specify these two model parameters? a478cc3f888e896be4de7279e781c81

fizzking avatar Dec 14 '23 09:12 fizzking

You should parse the args to CuboidTransformerModel like

https://github.com/gaozhihan/earth-forecasting-transformer/blob/a5c07f22ec53ba577d679e0a3be8eb7e77d3e82c/tests/test_cuboid.py#L24-L29

gaozhihan avatar Dec 14 '23 16:12 gaozhihan

Thank you very much for your patient reply! I successfully ran this code.

fizzking avatar Dec 15 '23 04:12 fizzking

The test_cuboid.py you provided is to test the data. Do I need to write a training code according to your train_cuboid_nbody

fizzking avatar Dec 17 '23 13:12 fizzking

Yes, please feel free to refer to [train_cuboid_nbody.py]](https://github.com/amazon-science/earth-forecasting-transformer/blob/7732b03bdb366110563516c3502315deab4c2026/scripts/cuboid_transformer/nbody/train_cuboid_nbody.py) and train_cuboid_sevir.py for implementing your own training script. The main task is to implement your own LightningDataModule to replace the original one

https://github.com/amazon-science/earth-forecasting-transformer/blob/7732b03bdb366110563516c3502315deab4c2026/scripts/cuboid_transformer/sevir/train_cuboid_sevir.py#L485-L506

gaozhihan avatar Dec 18 '23 16:12 gaozhihan

My data is a csv file with M rows and N columns, where the columns of the csv file are: time, latitude, longitude, several predictive factors and target outputs affected by the predictive factors. So each row represents different predictive factors and targets at different times and different longitude locations, but my latitude and longitude are not on a regular grid of points as in the ENSO example you provided, so there is no way to handle it as an array shape like ENSO (Time, lat, lon, number of predictive factor), isn't it necessary to process the data on a regular grid with regular latitude and longitude lat x lon in order to enter it into Earthformer?

fizzking avatar Dec 24 '23 12:12 fizzking

Earthformer is designed to handle regularly gridded data. For your case, you may want to use masks to indicate missing values, if the data is not too sparse.

gaozhihan avatar Dec 24 '23 17:12 gaozhihan

Are there any examples for reference?

fizzking avatar Dec 25 '23 13:12 fizzking