dynamo-tutorials icon indicating copy to clipboard operation
dynamo-tutorials copied to clipboard

Error in scNT_seq: least squares fails

Open Baschdl opened this issue 2 years ago • 3 comments

When changing neuron_labeling.obs['time'] = neuron_labeling.obs.time.astype("categorical") (cell 8) to neuron_labeling.obs['time'] = neuron_labeling.obs.time.astype(float) to circumvent the problem in #4, I get a ValueError: Residuals are not finite in the initial point. while dynamo/estimation/tsc/estimation_kinetic.py:auto_fit runs least squares. Is this just an irrelevant error because one should fix #4 differently?

Full trace:

[<ipython-input-9-5d53bc49f5de>](https://ndagie4afs-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20230321-060141-RC01_518395136#) in dynamo_workflow(adata)
      2     dyn.pp.recipe_monocle(adata)
      3 
----> 4     dyn.tl.dynamics(adata)
      5 
      6     dyn.tl.reduceDimension(adata)

[~/.local/lib/python3.9/site-packages/dynamo/tools/dynamics.py](https://ndagie4afs-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20230321-060141-RC01_518395136#) in dynamics(adata, filter_gene_mode, use_smoothed, assumption_mRNA, assumption_protein, model, est_method, NTR_vel, group, protein_names, concat_data, log_unnormalized, one_shot_method, fraction_for_deg, re_smooth, sanity_check, del_2nd_moments, cores, tkey, **est_kwargs)
    731             data_type = "smoothed" if use_smoothed else "sfs"
    732 
--> 733             (params, half_life, cost, logLL, param_ranges, cur_X_data, cur_X_fit_data,) = kinetic_model(
    734                 subset_adata,
    735                 tkey,

[~/.local/lib/python3.9/site-packages/dynamo/tools/dynamics.py](https://ndagie4afs-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20230321-060141-RC01_518395136#) in kinetic_model(subset_adata, tkey, model, est_method, experiment_type, has_splicing, splicing_labeling, has_switch, param_rngs, data_type, return_ntr, **est_kwargs)
   1540                     cur_X_raw = np.hstack((cur_X_raw[0, 0].A, cur_X_raw[1, 0].A))
   1541 
-> 1542             _, cost[i_gene] = estm.auto_fit(np.unique(time), cur_X_data)
   1543             (
   1544                 model_1,

[~/.local/lib/python3.9/site-packages/dynamo/estimation/tsc/estimation_kinetic.py](https://ndagie4afs-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20230321-060141-RC01_518395136#) in auto_fit(self, time, x_data, alpha_min, beta_min, gamma_min, kin_weight, use_p0, **kwargs)
    709 
    710         if use_p0:
--> 711             popt, cost = self.fit_lsq(time, x_data_norm, p0=p0, **kwargs)
    712         else:
    713             popt, cost = self.fit_lsq(time, x_data_norm, **kwargs)

[~/.local/lib/python3.9/site-packages/dynamo/estimation/tsc/estimation_kinetic.py](https://ndagie4afs-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20230321-060141-RC01_518395136#) in fit_lsq(self, t, x_data, p0, n_p0, bounds, sample_method, method, normalize)
    209         X = []
    210         for i in range(n_p0):
--> 211             ret = least_squares(
    212                 lambda p: self.f_lsq(p, t, x_data_norm, method, normalize),
    213                 p0[i],

[/usr/local/lib/python3.9/dist-packages/scipy/optimize/_lsq/least_squares.py](https://ndagie4afs-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20230321-060141-RC01_518395136#) in least_squares(fun, x0, jac, bounds, method, ftol, xtol, gtol, x_scale, loss, f_scale, diff_step, tr_solver, tr_options, jac_sparsity, max_nfev, verbose, args, kwargs)
    835 
    836     if not np.all(np.isfinite(f0)):
--> 837         raise ValueError("Residuals are not finite in the initial point.")
    838 
    839     n = x0.size

ValueError: Residuals are not finite in the initial point.

Baschdl avatar Mar 22 '23 17:03 Baschdl

Hi @Baschdl thanks for raising up the questions again, please note that the labeling time is always 2 hours and the time is when did we treat cells with KCL. the following code should work:

neuron_labeling.obs['label_time'] = 2 # this is the labeling time
tkey = 'label_time'

# peng_gene_list = pd.read_csv('/Users/xqiu/Dropbox (Personal)/dynamo/dont_remove/0408_grp_info.txt', sep='\t')

neuron_labeling = neuron_labeling[:, neuron_labeling.var.activity_genes]

def dynamo_workflow(adata):
    dyn.pp.recipe_monocle(adata, tkey='label_time', experiment_type='one-shot')

    dyn.tl.dynamics(adata)

    dyn.tl.reduceDimension(adata)

    dyn.tl.cell_velocities(adata, calc_rnd_vel=True, transition_genes=adata.var_names)

    dyn.vf.VectorField(adata, basis='umap')

neuron_labeling.obs['time'] = neuron_labeling.obs['time']/60
dynamo_workflow(neuron_labeling)
dyn.pl.streamline_plot(neuron_labeling, color='time', color_key_cmap='viridis', basis='umap', show_legend='right')

Would you mind to add more pull requests to the previous one that I closed? Thanks again and please let me know if anything else we can help

Thanks, Xiaojie

Xiaojieqiu avatar Mar 22 '23 21:03 Xiaojieqiu

Thanks a lot @Xiaojieqiu. I was able to get it to run for labeling data with

def dynamo_workflow_labeling(adata):
    dyn.pp.recipe_monocle(adata, tkey='label_time', experiment_type='one-shot')

    dyn.tl.dynamics(adata)

    dyn.tl.reduceDimension(adata)

    dyn.tl.cell_velocities(adata, calc_rnd_vel=True, transition_genes=adata.var_names)
    
    dyn.vf.VectorField(adata, basis='umap')
    

and for splicing data with the original version:

def dynamo_workflow_splicing(adata):
    dyn.pp.recipe_monocle(adata)

    dyn.tl.dynamics(adata)

    dyn.tl.reduceDimension(adata)

    dyn.tl.cell_velocities(adata, calc_rnd_vel=True)
    
    dyn.vf.VectorField(adata, basis='umap')

This also solves #4. I can open a pull request for those changes. Is there a way to only have one dynamo_workflow() for both cases? Otherwise I would open a PR with both variants of the method.

Baschdl avatar Mar 23 '23 17:03 Baschdl

def dynamo_workflow_labeling(adata, **kwargs ):
    dyn.pp.recipe_monocle(adata, **kwargs)

    dyn.tl.dynamics(adata)

    dyn.tl.reduceDimension(adata)

    dyn.tl.cell_velocities(adata, calc_rnd_vel=True, transition_genes=adata.var_names)
    
    dyn.vf.VectorField(adata, basis='umap')

I think you can just add kwargs to the dynamo_workflow function to fix the issue.

Please feel free to make the pull request and close the issue #4.

Xiaojieqiu avatar Mar 23 '23 18:03 Xiaojieqiu