tigerbeetle
tigerbeetle copied to clipboard
docs: add page about OLTP
- docs: link to safety and performance docs
- docs: mention using multiple LSM trees in performance section
- docs: link from transfer reference to data modeling about amounts
- docs: add built for OLTP page
done.
Hey,
Thanks for your script, it seems to be effectively less memory consuming.
However, i found some errors, so here is a corrected version to be used :
#!/usr/bin/env python3
import sys
import pandas as pd
import re
"""
split_cpg_groups.py
Author: Zaka Yuen, JCSMR, ANU
Created on Feb 18 2020
Modified by Shangzhe Zhang Mars 2 2022
Modified by Antoine Quoniam_Barre on July 11 2024
The output of the nanopolish calling procedure is a log-likelihood ratio,
where a positive log-likelihood ratio indicates evidence for methylation.
Nanopolish groups nearby CpG sites together and calls the group jointly,
assigning the same scores to each site in the group.
This script is to:
-use after running nanopolish call-methylation
-allow per-site comparison to other datasets
-split up the CpG group into its constituent CpG sites and assign the same log-likelihood ratio
-keeping strandedness and readID information
"""
target ="CG"
out = open(snakemake.output[0],'w')
out.write("\t".join(['Chr', 'Pos','Strand', 'Log.like.ratio', 'Read_ID']) + "\n")
with open(snakemake.input[0],'r') as fh:
next(fh)
for line in fh:
fields = line.rstrip().split()
chrom = str(fields[0])
strand = str(fields[1])
start = int(fields[2]) + 1 # fix the position
read_name = str(fields[4]).split('_', 1)[0]
logRatio = float(fields[5])
cpg_num = int(fields[9])
seq = str(fields[10])
if cpg_num == 1:
out.write(chrom + "\t" + str(start) + "\t" + strand + "\t" + str(logRatio) + "\t" + read_name + "\n")
elif cpg_num > 1:
index = []
for match in re.finditer(target, seq):
out.write(chrom + "\t" + str(start) + "\t" + strand + "\t" + str(logRatio) + "\t" + read_name + "\n")
for match in re.finditer(target, seq):
index.append(match.start())
length = len(index)
for i in range(1, length):
new_start = start + (index[i] - index[0])
out.write(chrom + "\t" + str(new_start) + "\t" + strand + "\t" + str(logRatio) + "\t" + read_name + "\n")