xpore icon indicating copy to clipboard operation
xpore copied to clipboard

dataprep result is empty

Open Seongmin-Jang-1165 opened this issue 1 year ago • 4 comments

hello developer!

i ran xpore dataprep wiht my direct RNAseq data generated with SQK-RNA004 kit

but the output file is empty and i cannot identify what the problem is..

can you advise me about this problem..?

run log.out.txt

Seongmin-Jang-1165 avatar Nov 20 '24 10:11 Seongmin-Jang-1165

Hi @Seongmin-Jang-1165,

Can you provide the following, please?

first 10 lines from xpore dataprep eventalign_index

head eventalign.index

first 10 lines from nanopolish eventalign.txt

head eventalign.txt

first 10 lines from the gtf file

head [annotation.gtf]

Thanks!

Best wishes, Yuk Kei

yuukiiwa avatar Nov 25 '24 20:11 yuukiiwa

Updated at 23rd Dec.

Hi, I have solved the following error by running cmd: python -m xxx.xxx.xxx(pth of xpore code).xpore dataprep --eventalign eventalign_file.txt --out_dir output_pth

============================================================================

Hi, I have come across the same question:

I tried the following steps before running the cmd xpore-dataprep:

dataset: mm39, WT and KO, here use ko as an example

pre-processing:

1. multi-fast5 to single-fast5:

multi_to_single_fast5 -i demo/guppy -s demo/guppy_single -t 40 --recursive

2. basecalling:

guppy_basecaller -i /data/fast5_data/mm_WT/single_fast5/ -s ko.guppy/ --config ~/nanopore_methods/ont-guppy/data/rna_r9.4.1_70bps_hac.cfg -r --num_callers 4 --cpu_threads_per_caller 2 --device auto cat ko.guppy/pass/*.fastq > ko.fastq

3. minimap2 generates .sam file:

minimap2 -ax map-ont -k 14 GRCm38.transcripts.fa -t 25 --secondary=no /data/fast5_data/mm_KO/ko.fastq -o /data/fast5_data/mm_KO/ko.sam

4. minimap generates .bam file:

samtools view -@ 30 -F 2048 -F 4 -b ko.sam | samtools sort -O BAM -@ 20  -o ko.bam

samtools index -@ 16 ko.bam  # generate index

5. nanopolish

first generate index: nanopolish index -d <PATH/TO/FAST5_DIR> <PATH/TO/FASTQ_FILE>

nanopolish index -d single_fast5/ ko.fastq > index.log 2>&1

then eventalign:

nanopolish eventalign --read ko.fastq \
--bam ko.bam \
--genome ~/reference_fa/mm10/GRCm38.transcripts.fa \
--scale-events \
--signal-index \
--summary ko_summary.txt \
--threads 50 \
> ~/nanopore_methods/xpore/nanopolish_files/ko_eventalign.log \
2>&1

xpore processing:

# For mm_WT, it follows the same previous steps,
# here I just run one set for checking whether it could work.

xpore-dataprep --eventalign /data/fast5_data/mm_WT/wt_eventalign.txt \
    --summary /data/fast5_data/mm_WT/wt_summary.txt \
    --out_dir ~/nanopore_methods/xpore/wt/ \
    --n_processes 4 --readcount_max 20000 > ~/nanopore_methods/xpore/wt/xpore_dataprep.log 2>&1

The output of nanopolish eventalign step are like:

(/home/rlwang/m6a) rlwang@dell-tower-server:/data/fast5_data/mm_WT$ head wt_eventalign.txt -n 3
contig  position        reference_kmer  read_index      strand  event_index     event_level_mean    event_stdv       event_length    model_kmer      model_mean      model_stdv      standardized_level  start_idx        end_idx
ENSMUST00000130201      548     TGTTA   20      t       5       104.38  2.869   0.00697 TGTTA   106.43       7.49    -0.23   16780   16801
ENSMUST00000130201      548     TGTTA   20      t       6       110.41  6.096   0.00963 TGTTA   106.43       7.49    0.44    16751   16780
(/home/rlwang/m6a) rlwang@dell-tower-server:/data/fast5_data/mm_WT$ head wt_summary.txt -n 3
read_index      read_name       fast5_path      model_name      strand  num_events      num_steps   num_skips        num_stays       total_duration  shift   scale   drift   var
20      571a5dee-2649-41de-8bb3-c65aae7359f6    /data/fast5_data/mm_WT/single_fast5/all_fast5/571a5dee-2649-41de-8bb3-c65aae7359f6.fast5             template        453     226     4       222     2.61-2.967   0.903   0.000   1.423
37      0ce4f4e3-19aa-4aed-a8bf-86f3ec729fce    /data/fast5_data/mm_WT/single_fast5/all_fast5/0ce4f4e3-19aa-4aed-a8bf-86f3ec729fce.fast5             template        1929    951     20      957     11.674.012   0.955   0.000   1.266

Then we I run the xpore dataprep processing cmd, I got the outputs like:

(/home/rlwang/m6a) rlwang@dell-tower-server:~/nanopore_methods/xpore/wt$ ls
eventalign.hdf5  eventalign.log  xpore_dataprep.log
(/home/rlwang/m6a) rlwang@dell-tower-server:~/nanopore_methods/xpore/wt$ du -sh *
4.0K    eventalign.hdf5
0       eventalign.log
36K     xpore_dataprep.log

xpore_dataprep.log is like:

(base) rlwang@dell-tower-server:~/nanopore_methods/xpore/wt$ ls
eventalign.hdf5  eventalign.log  xpore_dataprep.log
(base) rlwang@dell-tower-server:~/nanopore_methods/xpore/wt$ tail -f xpore_dataprep.log 
    obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
  File "/home/rlwang/m6a/lib/python3.6/site-packages/pandas/core/indexing.py", line 1099, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "/home/rlwang/m6a/lib/python3.6/site-packages/pandas/core/indexing.py", line 1037, in _getitem_iterable
    keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
  File "/home/rlwang/m6a/lib/python3.6/site-packages/pandas/core/indexing.py", line 1240, in _get_listlike_indexer
    indexer, keyarr = ax._convert_listlike_indexer(key)
  File "/home/rlwang/m6a/lib/python3.6/site-packages/pandas/core/indexes/multi.py", line 2400, in _convert_listlike_indexer
    raise KeyError(f"{keyarr[mask]} not in index")
KeyError: "['08be73d2-2dcc-4c45-b572-8ce3b807c2a1'] not in index"

but this record could be found in wt_summary.txt file:

(base) rlwang@dell-tower-server:/data/fast5_data/mm_WT$ grep '08be73d2-2dcc-4c45-b572-8ce3b807c2a1' wt_summary.txt
6	08be73d2-2dcc-4c45-b572-8ce3b807c2a1	/data/fast5_data/mm_WT/single_fast5/all_fast5/08be73d2-2dcc-4c45-b572-8ce3b807c2a1.fast5		template	2467	1262	25	1179	15.35	1.159	0.939	0.000	1.289

Moretta1 avatar Dec 22 '24 04:12 Moretta1

Hi @Seongmin-Jang-1165,

Sorry for the delayed reply! Just came back from vacation.

Can you update xpore, please? xpore-dataprep is deprecated.

Also, you will need to indicate the RNA004 kmer model when you get to the xpore diffmod step: https://github.com/GoekeLab/xpore/blob/RNA004_kmer_model/xpore/diffmod/RNA004_5mer_model.txt

Thanks!

Best wishes, Yuk Kei

yuukiiwa avatar Jan 03 '25 06:01 yuukiiwa

@yuukiiwa Sorry for late reply...

I'll attach the information that you requested

and

can you tell me how to indicate the RNA004 model when I run the xpore diffmod?? is there specific code for this??

Xpore_code.txt

Seongmin-Jang-1165 avatar Feb 15 '25 02:02 Seongmin-Jang-1165