scGPT
scGPT copied to clipboard
Help with the installation on colab
hello, thanks for this great tool !!!
I follow the installation tutorial and got some errors:
import os
import sys
if "google.colab" in sys.modules:
print("Running on Google Colab")
print("Installing dependencies...")
!pip install -U scgpt
# the optional dependency of flash-attion is skipped on colab
!pip install wandb louvain
# NOTE: May need to restart runtime after the installation
print("Downloading data and model ckpt...")
!pip install -q -U gdown
import gdown
import scvi
adata = scvi.data.pbmc_dataset()
Errors:
INFO File data/gene_info_pbmc.csv already downloaded
INFO File data/pbmc_metadata.pickle already downloaded
INFO File data/pbmc8k/filtered_gene_bc_matrices.tar.gz already downloaded
INFO Extracting tar file
INFO Removing extracted data at data/pbmc8k/filtered_gene_bc_matrices
INFO File data/pbmc4k/filtered_gene_bc_matrices.tar.gz already downloaded
INFO Extracting tar file
INFO Removing extracted data at data/pbmc4k/filtered_gene_bc_matrices
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[<ipython-input-16-2b688d4bab92>](https://localhost:8080/#) in <cell line: 1>()
1 if dataset_name == "PBMC_10K":
----> 2 adata = scvi.data.pbmc_dataset() # 11990 × 3346
3 ori_batch_col = "batch"
4 adata.obs["celltype"] = adata.obs["str_labels"].astype("category")
5 adata.var = adata.var.set_index("gene_symbols")
2 frames
[/usr/local/lib/python3.10/dist-packages/numpy/__init__.py](https://localhost:8080/#) in __getattr__(attr)
322
323 if attr in __former_attrs__:
--> 324 raise AttributeError(__former_attrs__[attr])
325
326 if attr == 'testing':
AttributeError: module 'numpy' has no attribute 'str'.
`np.str` was a deprecated alias for the builtin `str`. To avoid this error in existing code, use `str` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.str_` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
Could you help with this problem? Thanks !!!
Best
For which task is this ? I may try to help you with the annotation task.
First you should upload the datasets and pretrained models to a folder in your drive. Then, mount the drive to colab for path inserting. Then follow the below:
#!pip install scgpt ( if you dont want to use flash-attn) !pip install scgpt "flash-attn<1.0.5" (takes time)
!pip install wandb
import copy import gc import json import os from pathlib import Path import shutil import sys import time import traceback from typing import List, Tuple, Dict, Union, Optional import warnings import pandas as pd
import pickle import torch from anndata import AnnData import scanpy as sc import scvi import seaborn as sns import numpy as np import wandb from scipy.sparse import issparse import matplotlib.pyplot as plt from torch import nn from torch.nn import functional as F from torch.utils.data import Dataset, DataLoader from sklearn.model_selection import train_test_split from sklearn.metrics import adjusted_rand_score, normalized_mutual_info_score from torchtext.vocab import Vocab from torchtext._torchtext import ( Vocab as VocabPybind, ) from sklearn.metrics import confusion_matrix
sys.path.insert(0, "../") import scgpt as scg from scgpt.model import TransformerModel, AdversarialDiscriminator from scgpt.tokenizer import tokenize_and_pad_batch, random_mask_value from scgpt.loss import ( masked_mse_loss, masked_relative_error, criterion_neg_log_bernoulli, ) from scgpt.tokenizer.gene_tokenizer import GeneVocab from scgpt.preprocess import Preprocessor from scgpt import SubsetsBatchSampler from scgpt.utils import set_seed, category_str2int, eval_scib_metrics
sc.set_figure_params(figsize=(6, 6)) os.environ["KMP_WARNINGS"] = "off" warnings.filterwarnings('ignore')
This will probably work
@kocemir , thanks for your help!
Yes, I want to do the annotation task !
You are right that adding "flash-attn<1.0.5" takes really long time !!! I am using free colab GPU resources that I cannot finish the installation with adding flash-attn.
Best
@kocemir
Im still running into package version issues after running your suggestion. Did you have any issues like this?
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
[<ipython-input-1-1a6db007bc8e>](https://localhost:8080/#) in <cell line: 0>()
20 from anndata import AnnData
21 import scanpy as sc
---> 22 import scvi
23 import seaborn as sns
24 import numpy as np
5 frames
[/usr/local/lib/python3.11/dist-packages/scvi/data/_utils.py](https://localhost:8080/#) in <module>
11 import scipy.sparse as sp_sparse
12 from anndata import AnnData
---> 13 from anndata._core.sparse_dataset import SparseDataset
14
15 # TODO use the experimental api once we lower bound to anndata 0.8
ImportError: cannot import name 'SparseDataset' from 'anndata._core.sparse_dataset' (/usr/local/lib/python3.11/dist-packages/anndata/_core/sparse_dataset.py)
Hi,
I was able to set environment once, then I always used the same set up in the same device in server. Main problem is that when anndata change something and the scgpt repo is not updated, it becomes really hard to run the code.
Devin Burke @.***>, 1 Nis 2025 Sal, 01:22 tarihinde şunu yazdı:
@kocemir https://github.com/kocemir
Im still running into package version issues after running your suggestion. Did you have any issues like this?
ImportError Traceback (most recent call last)
in <cell line: 0>() 20 from anndata import AnnData 21 import scanpy as sc ---> 22 import scvi 23 import seaborn as sns 24 import numpy as np 5 frames
/usr/local/lib/python3.11/dist-packages/scvi/data/_utils.py in
11 import scipy.sparse as sp_sparse 12 from anndata import AnnData ---> 13 from anndata._core.sparse_dataset import SparseDataset 14 15 # TODO use the experimental api once we lower bound to anndata 0.8 ImportError: cannot import name 'SparseDataset' from 'anndata._core.sparse_dataset' (/usr/local/lib/python3.11/dist-packages/anndata/_core/sparse_dataset.py)
— Reply to this email directly, view it on GitHub https://github.com/bowang-lab/scGPT/issues/241#issuecomment-2767549243, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWMSX5OWPOXKAYNQ3F76IWT2XG52XAVCNFSM6AAAAABMTORVSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRXGU2DSMRUGM . You are receiving this because you were mentioned.Message ID: @.***> [image: mrburke00]mrburke00 left a comment (bowang-lab/scGPT#241) https://github.com/bowang-lab/scGPT/issues/241#issuecomment-2767549243
@kocemir https://github.com/kocemir
Im still running into package version issues after running your suggestion. Did you have any issues like this?
ImportError Traceback (most recent call last)
in <cell line: 0>() 20 from anndata import AnnData 21 import scanpy as sc ---> 22 import scvi 23 import seaborn as sns 24 import numpy as np 5 frames
/usr/local/lib/python3.11/dist-packages/scvi/data/_utils.py in
11 import scipy.sparse as sp_sparse 12 from anndata import AnnData ---> 13 from anndata._core.sparse_dataset import SparseDataset 14 15 # TODO use the experimental api once we lower bound to anndata 0.8 ImportError: cannot import name 'SparseDataset' from 'anndata._core.sparse_dataset' (/usr/local/lib/python3.11/dist-packages/anndata/_core/sparse_dataset.py)
— Reply to this email directly, view it on GitHub https://github.com/bowang-lab/scGPT/issues/241#issuecomment-2767549243, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWMSX5OWPOXKAYNQ3F76IWT2XG52XAVCNFSM6AAAAABMTORVSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRXGU2DSMRUGM . You are receiving this because you were mentioned.Message ID: @.***>
@kocemir Thanks for the quick response. Yeah thats what I have come to notice. Any chance you still have your environment saved? May I see your pip freeze if possible?
@kocemir I have the exact same issue cannot import name 'SparseDataset' from 'anndata._core.sparse_dataset' as @mrburke00.
For me installing anndata==0.10.8 worked