Brian Hie
Brian Hie
@cai-lw here's a STOC paper describing random projection trees: http://cseweb.ucsd.edu/~dasgupta/papers/rptree-stoc.pdf This could be what you're looking for?
It is possible to modify the package to use other language models, but this is currently not supported and would require some implementation effort
The notebook uses an old version of the esm package (`fair-esm==0.4.0`) that supports parsing non-amino acid characters. If you want to fix the error (and reproduce the paper results), you...
Can you try using the `sketch` parameter to reduce memory consumption? Maybe set the sketch value to a low enough value to fit in memory, if you are indeed getting...
Perhaps it's killed due to memory usage? Can you try using sparse matrices, e.g., ```python import scipy.sparse X_coexpr = scipy.sparse.csr_matrix(X_coexpr) ```
For now there's no anndata writer for the language model itself. You can try doing `adata.uns['model'] = None` before saving (i.e., don't save the model) and that should probably work.
The model is only needed for velocity score computation, not any of the downstream analysis. I would compute the velocities on the HPC, save it without the model, then just...
@pan-genome we were able to just use the standard HuggingFace sampling API (e.g., loading with `AutoModelForCausalLM.from_pretrained()`, sampling with `model.generate()`) to generate 500k+ on an 80 Gb GPU.
Something like ```python model_config = AutoConfig.from_pretrained( 'togethercomputer/evo-1-131k-base', trust_remote_code=True, revision="1.1_fix", ) model_config.max_seqlen = 500_000 model = AutoModelForCausalLM.from_pretrained( 'togethercomputer/evo-1-131k-base', config=model_config, trust_remote_code=True, revision="1.1_fix", ) outputs = model.generate( input_ids, max_new_tokens=500_000, temperature=1., top_k=4, ) ```
This may be a bug with the way we are handling sequences longer than the ESM context length, will investigate.