spender problems faced while re-implementing the model training process

problems faced while re-implementing the model training process

Open CTangist opened this issue 1 year ago • 5 comments

How should I organize the SDSS data and directory structure to successfully reproduce the train_sdss.py?

Mar 21 '24 15:03 CTangist

@pmelchior

Mar 21 '24 15:03 CTangist

The file organization is part of our SDSS data model. You can use the function SDSS.save_in_batches to created the necessary file structures. In short, we bundle N=1024 spectra into one file and pickle it.

Mar 23 '24 18:03 pmelchior

The file organization is part of our SDSS data model. You can use the function SDSS.save_in_batches to created the necessary file structures. In short, we bundle N=1024 spectra into one file and pickle it.

Thank you, but I still don't quite understand how to retrain the spender model with my own data. If I directly run 'python -m train.train_sdss /path/to/data/ /path/to/output/', I get an error: AssertionError: File list cannot be empty.

Mar 27 '24 12:03 CTangist

The file organization is part of our SDSS data model. You can use the function SDSS.save_in_batches to created the necessary file structures. In short, we bundle N=1024 spectra into one file and pickle it.

I would like to know how to convert the SDSS DR16 downloaded spectral fits files into filenames in the format '{classname}{tag}_*.pkl'. Is it as you mentioned, directly running the SDSS.save_in_batches command? How should I call this function, please guide me, thank you!

Mar 27 '24 15:03 CTangist

We're working on a simpler data loading approach, but in the meantime, this should work:

from spender.data.sdss import SDSS

dir = "your_storage_directory"

# get all IDs from master catalog, needs to download that table into dir first
results = SDSS.query(dir)

# download all spectra files, process them, and save them in batches
SDSS.save_in_batches(dir, results, batch_size=1024)

Mar 28 '24 15:03 pmelchior

spender spender copied to clipboard

problems faced while re-implementing the model training process

spender
spender copied to clipboard