Reinstate check for chromosome name in read_hapmap
For simplicity, code to check for consistency of the chromosome name was removed from the new msprime.read_hapmap() code. However, it seems a good idea to check, by default, that if there is a column in the hapmap file with the chromosome name, it contains the same string. The reason I want this is that (a) it is common to munge multiple chromosomes together in the same file, and we should check that the user is not doing this by mistake (normally this would be caught because the new chromosome would start again at a low value, so break the ordering constraint, but this isn't guaranteed, and is fragile to rely upon for e.g. microchromosomes in birds) (b) it is useful to have the chromosome name, and we had thought that we might use it in future (although we are not doing so at the moment) and (c) @grahamgower suggests (and I agree) that we should try to be pretty strict about the format we are defaulting to in read_hapmap(), so that we don't get into the bind we had before of making files that will "work" in stdpopsim, but are not in any conventional format.
So that I don't lose the code, this was previously done in intervals.py: "read_hapmap" as follows
if 0 not in column:
column[0] = ("chrom", object)
... do the np.readtxt() thing
if "chrom" in data and len(np.unique(data["chrom"])) > 1:
warnings.warn(
f"Different chromosome names found in the first column of {fileobj}. "
"Check that the file contains data for a single chromosome."
)
and the test case was
def test_read_hapmap_string_warning(self):
hapfile = io.StringIO(
"""\
HEADER
chr1 0 x 0
chr1 1 x 0.000001 x
chr10 2 x 0.000006 x x x"""
)
with warnings.catch_warnings(record=True) as w:
warnings.simplefilter("always")
rm = msprime.read_hapmap(hapfile)
assert len(w) == 1
assert "chromosome names" in str(w[0].message)
np.testing.assert_array_equal(rm.position, [0, 1, 2])
assert np.allclose(rm.rate, [1e-8, 5e-8])