MMseqs2 icon indicating copy to clipboard operation
MMseqs2 copied to clipboard

'FileNotFoundError: [Errno 2] No such file or directory

Open Ryosuke-254 opened this issue 1 year ago • 2 comments

I wrote the following code to select sequences with sequence identity below 0.8, using Google Colab, but I am encountering the error 'FileNotFoundError: [Errno 2] No such file or directory: './mmseqs_work/search_result.m8'.' I am also unable to use the GPU in Colab. Ideally, I want to use the GPU in Colab to select sequences with sequence identity below 0.8. I would appreciate any advice you could provide.

必要なライブラリとツールのインストール

!apt-get update !apt-get install -y mmseqs2 wget !pip install biopython !apt-get install -y nvidia-cuda-toolkit

PyTorchを使ってGPUの可用性を確認

import torch print(torch.cuda.is_available()) # Trueが返るはず print(torch.cuda.get_device_name(0)) # 使用可能なGPUの名前を表示

CUDA関連ツールのインストール(オプションで必要な場合)

!apt-get install -y cuda-toolkit-12-2 # 使用したいCUDAバージョンに合わせて変更 !apt-get install -y cmake !mkdir build && cd build !cmake -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=. -DENABLE_CUDA=1 -DCMAKE_CUDA_ARCHITECTURES="75;80;86;89;90" .. !make -j8 !make install !pip install -q condacolab import condacolab condacolab.install() !pip install pycuda import pycuda.driver as cuda cuda.init() print(f"CUDA device count: {cuda.Device.count()}") # CUDAデバイスの数を表示

MMseqs2ワークディレクトリの作成

import os

work_dir = "./mmseqs_work" os.makedirs(work_dir, exist_ok=True)

入力FASTAファイルを指定

input_fasta = "/content/38181dh_c.fasta" # 既存のFASTAファイルパスを指定

MMseqs2データベースの作成(1回のみ)

!mmseqs createdb {input_fasta} {work_dir}/db

自身に対してペアワイズ検索(GPUを使用)

search_result_path = os.path.join(work_dir, "search_result") tmp_dir = os.path.join(work_dir, "tmp") !mmseqs search {work_dir}/db {work_dir}/db {search_result_path} {tmp_dir} --min-seq-id 0.8 --threads 2 --gpu-only --search-type 3 --gpu only

出力結果を解析

import pandas as pd from Bio import SeqIO

search_result_m8 = f"{search_result_path}.m8" # MMseqs2出力ファイルパス

MMseqs2出力形式を読み込む

columns = ["query", "target", "pident", "alnlen", "mismatch", "gapopen", "qstart", "qend", "tstart", "tend", "evalue", "bits"] results = pd.read_csv(search_result_m8, sep="\t", names=columns)

配列同一性が80%未満のクエリ配列を抽出

filtered_results = results[results["pident"] < 80] unique_query_ids = set(filtered_results["query"])

元のFASTAから該当する配列を抽出

filtered_sequences = {rec.id: rec for rec in SeqIO.parse(input_fasta, "fasta") if rec.id in unique_query_ids} output_fasta = "/content/filtered_sequences.fasta"

with open(output_fasta, "w") as f: SeqIO.write(filtered_sequences.values(), f, "fasta")

print(f"フィルタされた配列を保存しました: {output_fasta}")

Ryosuke-254 avatar Dec 03 '24 09:12 Ryosuke-254

You can download the GPU-enabled binary from: https://github.com/soedinglab/MMseqs2/releases/download/16-747c6/mmseqs-linux-gpu.tar.gz

You don't need to compile it yourself.

The relevant parameter to enable the GPU search mode is --gpu 1:

mmseqs search {work_dir}/db {work_dir}/db {search_result_path} {tmp_dir} --min-seq-id 0.8 --threads 2 --search-type 3 --gpu 1 

Please refer to the wiki for additional details: https://github.com/soedinglab/MMseqs2/wiki#gpu-accelerated-search

milot-mirdita avatar Dec 03 '24 09:12 milot-mirdita

Thank you for your response. Even after replacing '--gpu-only' with '--gpu 1' as per your advice, I am encountering a similar error. The error is:

'Unrecognized parameter "--gpu-only". Did you mean "--gap-open" (Gap open cost)?' FileNotFoundError Traceback (most recent call last) in <cell line: 52>() 50 # MMseqs2出力形式を読み込む 51 columns = ["query", "target", "pident", "alnlen", "mismatch", "gapopen", "qstart", "qend", "tstart", "tend", "evalue", "bits"] ---> 52 results = pd.read_csv(search_result_m8, sep="\t", names=columns) 53 54 # 配列同一性が80%未満のクエリ配列を抽出

4 frames /usr/local/lib/python3.10/dist-packages/pandas/io/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options) 871 if ioargs.encoding and "b" not in ioargs.mode: 872 # Encoding --> 873 handle = open( 874 handle, 875 ioargs.mode, FileNotFoundError: [Errno 2] No such file or directory: './mmseqs_work/search_result.m8'

How should I improve this? Also, is it possible to run MMseqs2 on Google Colab? I would appreciate any advice you can provide."

Ryosuke-254 avatar Dec 03 '24 10:12 Ryosuke-254