scGPT Help with the installation on colab

hello, thanks for this great tool !!!

I follow the installation tutorial and got some errors:

import os
import sys

if "google.colab" in sys.modules:
    print("Running on Google Colab")
    print("Installing dependencies...")
    !pip install -U scgpt
    # the optional dependency of flash-attion is skipped on colab
    !pip install wandb louvain

    # NOTE: May need to restart runtime after the installation

    print("Downloading data and model ckpt...")
    !pip install -q -U gdown
    import gdown

import scvi
adata = scvi.data.pbmc_dataset()

Errors:

INFO     File data/gene_info_pbmc.csv already downloaded                                                           
INFO     File data/pbmc_metadata.pickle already downloaded                                                         
INFO     File data/pbmc8k/filtered_gene_bc_matrices.tar.gz already downloaded                                      
INFO     Extracting tar file                                                                                       
INFO     Removing extracted data at data/pbmc8k/filtered_gene_bc_matrices                                          
INFO     File data/pbmc4k/filtered_gene_bc_matrices.tar.gz already downloaded                                      
INFO     Extracting tar file                                                                                       
INFO     Removing extracted data at data/pbmc4k/filtered_gene_bc_matrices                                          
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-16-2b688d4bab92>](https://localhost:8080/#) in <cell line: 1>()
      1 if dataset_name == "PBMC_10K":
----> 2     adata = scvi.data.pbmc_dataset()  # 11990 × 3346
      3     ori_batch_col = "batch"
      4     adata.obs["celltype"] = adata.obs["str_labels"].astype("category")
      5     adata.var = adata.var.set_index("gene_symbols")

2 frames
[/usr/local/lib/python3.10/dist-packages/numpy/__init__.py](https://localhost:8080/#) in __getattr__(attr)
    322 
    323         if attr in __former_attrs__:
--> 324             raise AttributeError(__former_attrs__[attr])
    325 
    326         if attr == 'testing':

AttributeError: module 'numpy' has no attribute 'str'.
`np.str` was a deprecated alias for the builtin `str`. To avoid this error in existing code, use `str` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.str_` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

Could you help with this problem? Thanks !!!

Best

Aug 16 '24 06:08 bitcometz

For which task is this ? I may try to help you with the annotation task.

First you should upload the datasets and pretrained models to a folder in your drive. Then, mount the drive to colab for path inserting. Then follow the below:

#!pip install scgpt ( if you dont want to use flash-attn) !pip install scgpt "flash-attn<1.0.5" (takes time)

!pip install wandb

import copy import gc import json import os from pathlib import Path import shutil import sys import time import traceback from typing import List, Tuple, Dict, Union, Optional import warnings import pandas as pd

import pickle import torch from anndata import AnnData import scanpy as sc import scvi import seaborn as sns import numpy as np import wandb from scipy.sparse import issparse import matplotlib.pyplot as plt from torch import nn from torch.nn import functional as F from torch.utils.data import Dataset, DataLoader from sklearn.model_selection import train_test_split from sklearn.metrics import adjusted_rand_score, normalized_mutual_info_score from torchtext.vocab import Vocab from torchtext._torchtext import ( Vocab as VocabPybind, ) from sklearn.metrics import confusion_matrix

sys.path.insert(0, "../") import scgpt as scg from scgpt.model import TransformerModel, AdversarialDiscriminator from scgpt.tokenizer import tokenize_and_pad_batch, random_mask_value from scgpt.loss import ( masked_mse_loss, masked_relative_error, criterion_neg_log_bernoulli, ) from scgpt.tokenizer.gene_tokenizer import GeneVocab from scgpt.preprocess import Preprocessor from scgpt import SubsetsBatchSampler from scgpt.utils import set_seed, category_str2int, eval_scib_metrics

sc.set_figure_params(figsize=(6, 6)) os.environ["KMP_WARNINGS"] = "off" warnings.filterwarnings('ignore')

This will probably work

Aug 17 '24 09:08 kocemir

@kocemir , thanks for your help!

Yes, I want to do the annotation task !

You are right that adding "flash-attn<1.0.5" takes really long time !!! I am using free colab GPU resources that I cannot finish the installation with adding flash-attn.

Best

Aug 19 '24 02:08 bitcometz

@kocemir

Im still running into package version issues after running your suggestion. Did you have any issues like this?

---------------------------------------------------------------------------

ImportError                               Traceback (most recent call last)

[<ipython-input-1-1a6db007bc8e>](https://localhost:8080/#) in <cell line: 0>()
     20 from anndata import AnnData
     21 import scanpy as sc
---> 22 import scvi
     23 import seaborn as sns
     24 import numpy as np

5 frames

[/usr/local/lib/python3.11/dist-packages/scvi/data/_utils.py](https://localhost:8080/#) in <module>
     11 import scipy.sparse as sp_sparse
     12 from anndata import AnnData
---> 13 from anndata._core.sparse_dataset import SparseDataset
     14 
     15 # TODO use the experimental api once we lower bound to anndata 0.8

ImportError: cannot import name 'SparseDataset' from 'anndata._core.sparse_dataset' (/usr/local/lib/python3.11/dist-packages/anndata/_core/sparse_dataset.py)

Mar 31 '25 22:03 mrburke00

Hi,

I was able to set environment once, then I always used the same set up in the same device in server. Main problem is that when anndata change something and the scgpt repo is not updated, it becomes really hard to run the code.

Devin Burke @.***>, 1 Nis 2025 Sal, 01:22 tarihinde şunu yazdı:

@kocemir https://github.com/kocemir

Im still running into package version issues after running your suggestion. Did you have any issues like this?

ImportError Traceback (most recent call last)

in <cell line: 0>() 20 from anndata import AnnData 21 import scanpy as sc ---> 22 import scvi 23 import seaborn as sns 24 import numpy as np

5 frames

/usr/local/lib/python3.11/dist-packages/scvi/data/_utils.py in 11 import scipy.sparse as sp_sparse 12 from anndata import AnnData ---> 13 from anndata._core.sparse_dataset import SparseDataset 14 15 # TODO use the experimental api once we lower bound to anndata 0.8

ImportError: cannot import name 'SparseDataset' from 'anndata._core.sparse_dataset' (/usr/local/lib/python3.11/dist-packages/anndata/_core/sparse_dataset.py)

— Reply to this email directly, view it on GitHub https://github.com/bowang-lab/scGPT/issues/241#issuecomment-2767549243, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWMSX5OWPOXKAYNQ3F76IWT2XG52XAVCNFSM6AAAAABMTORVSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRXGU2DSMRUGM . You are receiving this because you were mentioned.Message ID: @.***> [image: mrburke00]mrburke00 left a comment (bowang-lab/scGPT#241) https://github.com/bowang-lab/scGPT/issues/241#issuecomment-2767549243

@kocemir https://github.com/kocemir

Im still running into package version issues after running your suggestion. Did you have any issues like this?

ImportError Traceback (most recent call last)

in <cell line: 0>() 20 from anndata import AnnData 21 import scanpy as sc ---> 22 import scvi 23 import seaborn as sns 24 import numpy as np

5 frames

/usr/local/lib/python3.11/dist-packages/scvi/data/_utils.py in 11 import scipy.sparse as sp_sparse 12 from anndata import AnnData ---> 13 from anndata._core.sparse_dataset import SparseDataset 14 15 # TODO use the experimental api once we lower bound to anndata 0.8

ImportError: cannot import name 'SparseDataset' from 'anndata._core.sparse_dataset' (/usr/local/lib/python3.11/dist-packages/anndata/_core/sparse_dataset.py)

— Reply to this email directly, view it on GitHub https://github.com/bowang-lab/scGPT/issues/241#issuecomment-2767549243, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWMSX5OWPOXKAYNQ3F76IWT2XG52XAVCNFSM6AAAAABMTORVSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRXGU2DSMRUGM . You are receiving this because you were mentioned.Message ID: @.***>

Apr 01 '25 03:04 kocemir

@kocemir Thanks for the quick response. Yeah thats what I have come to notice. Any chance you still have your environment saved? May I see your pip freeze if possible?

Apr 01 '25 03:04 mrburke00

@kocemir I have the exact same issue cannot import name 'SparseDataset' from 'anndata._core.sparse_dataset' as @mrburke00. For me installing anndata==0.10.8 worked

Apr 10 '25 16:04 bioshot-dotcom

scGPT scGPT copied to clipboard

Help with the installation on colab

scGPT
scGPT copied to clipboard