spender
spender copied to clipboard
problems faced while re-implementing the model training process
How should I organize the SDSS data and directory structure to successfully reproduce the train_sdss.py?
@pmelchior
The file organization is part of our SDSS data model. You can use the function SDSS.save_in_batches to created the necessary file structures. In short, we bundle N=1024 spectra into one file and pickle it.
The file organization is part of our SDSS data model. You can use the function
SDSS.save_in_batchesto created the necessary file structures. In short, we bundle N=1024 spectra into one file and pickle it.
Thank you, but I still don't quite understand how to retrain the spender model with my own data. If I directly run 'python -m train.train_sdss /path/to/data/ /path/to/output/', I get an error: AssertionError: File list cannot be empty.
The file organization is part of our SDSS data model. You can use the function
SDSS.save_in_batchesto created the necessary file structures. In short, we bundle N=1024 spectra into one file and pickle it.
I would like to know how to convert the SDSS DR16 downloaded spectral fits files into filenames in the format '{classname}{tag}_*.pkl'. Is it as you mentioned, directly running the SDSS.save_in_batches command? How should I call this function, please guide me, thank you!
We're working on a simpler data loading approach, but in the meantime, this should work:
from spender.data.sdss import SDSS
dir = "your_storage_directory"
# get all IDs from master catalog, needs to download that table into dir first
results = SDSS.query(dir)
# download all spectra files, process them, and save them in batches
SDSS.save_in_batches(dir, results, batch_size=1024)