Adding the stream.async.cast op to fix potential async correctness issue.
We had a potential correctness issue when using chained external fences and returning values (vs writing to output arguments) where we'd insert a stream.async.transfer as effectively just a cast to external lifetime for returned tensors. The problem is that the stream.timepoint.barrier feeding the chain_external op was before the transfer, meaning that if the user did wait on the fence and consume the returned value they may be consuming it before the transfer has executed. We're mostly saved today by most usage being through the synchronous ABI or torch placing results into outputs as well as most transfers being elided, but it was not guaranteed.
The new stream.async.cast that just does lifetime assertions and pins values in usage refinement. This allows us to import/export and cast to avoid any potential for copies to arise. Future changes will use this op in a timeline verification pass that checks that resources produced by every StreamableOp are consumed using an appropriate timeline.