How do reference sequences deal with trimming etc.
At the moment, I'm pretty sure that operations which change the coordinate system, primarily trim, rtrim, and ltrim, don't touch the reference sequence at all. I guess eventually they should (although how this happens when the ref seq is a URL, I have no idea). Alternatively, we should document that the TS after trimming may not have the reference sequence aligned correctly, so alignments will create bad data.
I though it worth opening up an issue just to keep track of this. I guess another issue is how to ensure that the reference sequence is of the same length as the genome. ISTR there was some discussion of this, but I can't find the issue on brief glance.
(NB probably worth dumping this issue in the reference sequence project)
Is it clear what those operations should exactly do with the reference? I'm tempted to say here that those methods should error in the presence of a reference, at least for now.
As for checking if the reference is the correct length - that needs to happen in methods that use the sequence rather than tskit checking when the sequence is added.
I think erroring out is fine for the moment. This needs to be coded up, though. And we need to think if any other operation changes the coordinate space (I don't think so, though)
In the longer term, I think the operation is reasonably well defined, as long as the amount being trimmed is an integer value. Perhaps eventually we trim the refseq as required, but still error out if the reference is a URL. We could have a function to allow a URL-based refseq to be downloaded and fully embedded into the TS (swelling the size of the TS, of course). That might be useful anyway, e.g. for people who want to work on a TS without a decent internet connection.