tskit icon indicating copy to clipboard operation
tskit copied to clipboard

Writing a RateMap to text (hapmap)

Open nspope opened this issue 4 months ago • 2 comments

I frequently use msprime.RateMap and often need to write these out to hapmap format, e.g. for input to a standalone tool. I noticed that tskit-dev/msprime#1338 discusses adding RateMap.write_hapmap, but afaik this isn't in the works. Is there still interest in including a method along these lines? Something similar to,

def write_hapmap(ratemap: msprime.RateMap, contig_name: str) -> str:
    """
    Write a recombination rate map into hapmap format.
    """
    physical_position = ratemap.position.astype(np.int64)
    scaled_rate = ratemap.rate * 1e8
    map_position = ratemap.get_cumulative_mass(physical_position) * 100
    hapmap = ["Chromosome\tPosition(bp)\tRate(cM/Mb)\tMap(cM)"]
    if np.isnan(scaled_rate[-1]):  # handle trailing NaN
        scaled_rate[-1] = 0.0
    else:
        scaled_rate = np.append(scaled_rate, 0.0)
    for rate, pos, map in zip(scaled_rate, physical_position, map_position):
        if not np.isnan(rate): 
            hapmap.append(f"{contig_name}\t{pos}\t{rate:.10f}\t{map:.10f}")
    hapmap = "\n".join(hapmap) + "\n"
    return hapmap
 

# for tests
def assert_hapmap_equals_ratemap(ratemap: msprime.RateMap, hapmap_path: str, atol: float = 1e-12):
    """ 
    Check that hapmap file matches rate map.
    """
    hapmap_check = msprime.RateMap.read_hapmap(hapmap_path, sequence_length=ratemap.sequence_length, rate_col=2)
    np.testing.assert_allclose(hapmap_check.rate, ratemap.rate, atol=atol)
    np.testing.assert_allclose(hapmap_check.position, ratemap.position)
    hapmap_check = msprime.RateMap.read_hapmap(hapmap_path, sequence_length=ratemap.sequence_length, map_col=3)
    np.testing.assert_allclose(hapmap_check.rate, ratemap.rate, atol=atol)
    np.testing.assert_allclose(hapmap_check.position, ratemap.position) 

But perhaps with arguments controlling scaling of rates/map positions; precision of text output; and header -- so that non-recombination maps could be written out in meaningful units?

nspope avatar Aug 07 '25 18:08 nspope