spectra icon indicating copy to clipboard operation
spectra copied to clipboard

Epoch error

Open theheking opened this issue 1 year ago • 2 comments

Currently fitting the spectra model using use_cell_type=True, which runs successfully on 2 epochs. But leads to all NA output with epoch # > 20. Not sure what would be driving the error here.

# fit the model (We will run this with only 2 epochs to decrease runtime in this tutorial) model = spc.est_spectra(adata=adata, gene_set_dictionary=annotations, use_highly_variable=True, cell_type_key="blueprint_labels_mapped", use_weights=True, lam=0.1, # varies depending on dsata and gene sets, try between 0.5 and 0.001 delta=0.001, kappa=None, rho=0.001, use_cell_types=True, n_top_vals=50, label_factors=True, overlap_threshold=0.2, clean_gs = True, min_gs_num = 3, num_epochs=100 #here running only 2 epochs for time reasons, we recommend 10,000 epochs for most datasets )

Cell type labels in gene set annotation dictionary and AnnData object are identical Your gene set annotation dictionary is now correctly formatted.

Screenshot 2024-11-28 at 2 53 17 PM

image

theheking avatar Nov 28 '24 03:11 theheking

Hi thank you for developing this tool,

I encountered the same problem, works for 2 epochs, but only NAs as results for 10000 epochs.

ChaDuss avatar Mar 20 '25 16:03 ChaDuss

Hello, I spent a really long time trying to figure this out. I believe that this issue stems from improper preprocessing of the counts, and or something later down your workflow that affects the preprocessed counts. Either way the issue is the counts (i.e., adata.X). One way to fix or test it:

  1. Load up your original file (start from scratch)
  2. Follow the steps from Scanpy or your own workflow. Just do your basic filtering, and don’t do anything that affects the raw counts.
  3. When you get to the ‘Normalization’ step run

Saving count data

adata.layers["counts"] = adata.X.copy()

Normalizing to median total counts

sc.pp.normalize_total(adata)

Logarithmize the data

sc.pp.log1p(adata)

This will save your original adata.X counts, as well as normalize and log transform your adata.X count data (which is what you want, if you are not using Scran). 4. Determine the highly variable genes. 5. Run Spectra (assuming you have everything else). You may want to transfer over your cell types.

Overall, the counts (adata.X) is likely messed up and should be properly normalized and log transformed. You may need to figure out how you want to deal with that, but that seems to be what is causing this issue.

I would suggest borrowing and modifying the code from scBestPractices where they plot the original counts and the ‘Shifted logarithm’ just to check that everything is at should be.

Image

JosephKnopp avatar Jun 18 '25 23:06 JosephKnopp