tskit
tskit copied to clipboard
Writing a RateMap to text (hapmap)
I frequently use msprime.RateMap and often need to write these out to hapmap format, e.g. for input to a standalone tool. I noticed that tskit-dev/msprime#1338 discusses adding RateMap.write_hapmap, but afaik this isn't in the works. Is there still interest in including a method along these lines? Something similar to,
def write_hapmap(ratemap: msprime.RateMap, contig_name: str) -> str:
"""
Write a recombination rate map into hapmap format.
"""
physical_position = ratemap.position.astype(np.int64)
scaled_rate = ratemap.rate * 1e8
map_position = ratemap.get_cumulative_mass(physical_position) * 100
hapmap = ["Chromosome\tPosition(bp)\tRate(cM/Mb)\tMap(cM)"]
if np.isnan(scaled_rate[-1]): # handle trailing NaN
scaled_rate[-1] = 0.0
else:
scaled_rate = np.append(scaled_rate, 0.0)
for rate, pos, map in zip(scaled_rate, physical_position, map_position):
if not np.isnan(rate):
hapmap.append(f"{contig_name}\t{pos}\t{rate:.10f}\t{map:.10f}")
hapmap = "\n".join(hapmap) + "\n"
return hapmap
# for tests
def assert_hapmap_equals_ratemap(ratemap: msprime.RateMap, hapmap_path: str, atol: float = 1e-12):
"""
Check that hapmap file matches rate map.
"""
hapmap_check = msprime.RateMap.read_hapmap(hapmap_path, sequence_length=ratemap.sequence_length, rate_col=2)
np.testing.assert_allclose(hapmap_check.rate, ratemap.rate, atol=atol)
np.testing.assert_allclose(hapmap_check.position, ratemap.position)
hapmap_check = msprime.RateMap.read_hapmap(hapmap_path, sequence_length=ratemap.sequence_length, map_col=3)
np.testing.assert_allclose(hapmap_check.rate, ratemap.rate, atol=atol)
np.testing.assert_allclose(hapmap_check.position, ratemap.position)
But perhaps with arguments controlling scaling of rates/map positions; precision of text output; and header -- so that non-recombination maps could be written out in meaningful units?