hhvm.com
hhvm.com copied to clipboard
What does Breaking Changes mean?
We commonly include a Breaking Changes section in release notes. What does the terminology actually mean?
I used to think it is similar to breakages in binary backward compatibility in C++ or Java. However, we don't have a binary release form of a library.
If it means breakage at the source level backward compatibility, it could be too sensitive, and we might have missed many breaking changes.
For example, we used to assume changing a function from the defaults capabilities to pure capability is a backward compatible change. However the assumption is not true if we take the type checker into account:
Given a library including the following function:
function method1(): void {
}
When changing it to
function method1()[]: void {
}
The following user code, which could type check originally, will be failed to type check any more:
function method2(): void {
}
function infer<T>(
(function()[T]: void) $_,
): (function ((function()[T]: void)):void) {
return $_ ==> {};
}
function broken(): void {
$f = infer(method1<>);
// Typing[4403] Invalid argumentHack(4403)
// Expected a function that requires the capability set {}
// But got a function that requires the capability set {AccessGlobals, IO, ImplicitPolicyLocal, RxLocal, System, WriteProperty}
$f(method2<>);
}
Similarly, changing the return type of a function into a more specific type is also backward incompatible at source level, even though similar changes are usually considered binary backward compatible in other languages.
Key takeaways
- We might want to change the structure of our release notes.
- I think currently it is not a big deal because any way we don't promise backward compatibility. However, if we want to create a stable language in the future, we will have to define a stable ABI for distributing libraries, not just the source code. Essentially, any statically typed languages that have the ability of type inference, would need a stable ABI in order to create reusable libraries that would not arbitrarily break the library users.
Anything that's reasonably likely to cause practical problems for end users. This is a very fuzzy definition, but these notes are to help humans, so that's fine.
- the example you gave does not seem particularly likely. Additionally, by this definition, any change of a return type - even narrowing it - would be a breaking change.
- we also tend to list things like changed typechecker error codes; as we already consider code with FIXMEs invalid, this does not fit any standard definition of a compatibility break, but it does cause static analysis
If it means breakage at the source level backward compatibility, it could be too sensitive to be broken, and we might have missed many breaking changes.
I don't think we should be aiming for a formal definition, with a 100% completeness: this would be less useful than what we have now.
If changes like marking a function as pure are considered breaking changes, this will effectively be a copy of git log. That data is available outside of the release notes, and is much lower signal.
We might want to change the structure of our release notes.
No strong feelings, but given this is meant to be practical advice for humans to take in their projects, I don't really see a need.
we will have to define a stable ABI for distributing libraries, not just the source code
This seems extremely unlikely, ever; bytecode will continue to change as needed for performance.
the example you gave does not seem particularly likely. Additionally, by this definition, any change of a return type - even narrowing it - would be a breaking change.
That's exactly why binary backward compatibility (or the corresponding concept in Hack) is a thing. It is much easier to keep binary backward compatibility than source backward compatibility in other languages.
we will have to define a stable ABI for distributing libraries, not just the source code
This seems extremely unlikely, ever; bytecode will continue to change as needed for performance.
(copy from internal post)
The goal of keeping binary backward compatibility in other languages is to allow for upgrading transitive dependencies. For example, given an application A depending on library B depending on library C, if library B is updating to a binary backward compatible and source level backward incompatible version, application A must be migrated upon the new version to pass type check. However, if library C is updating to a binary backward compatible and source level backward incompatible version, application A does not have to migrate upon the new version.
In Hack, the maintainer of the application A cannot simply update library C, unless the developer of library B migrate first, because there is no binary form.
Something similar to binary backward compatibility in Hack is "runtime backward compatibility", if we could exclude the vendor directory from type checking.
This is possible in existing codebases by generating HHIs if desireed, and using ignored_paths to hide the existing things.
It is not enough though: changes in internals (e.g. warnings in array operations) and builtin functions also change between releases.
That's exactly why binary backward compatibility (or the corresponding concept in Hack) is a thing. It is much easier to keep binary backward compatibility than source backward compatibility in other languages.
Narrowing of return types of functions (or in non-final classes) is not usually considered a BC break by other languages; in the case of C++, the ABI is actually weaker than the API here - a narrowed return type is not an API-breaking change, a widened is - but neither is an ABI-breaking change. The return type is not part of the mangling.
Also, I am not arguing that an ABI would not be useful for Hack - just that it is impractical from an implementation side. It would require:
- a stable bytecode
- a ban on adding additional errors to existing bytecodes (e.g. we would not have been able to add 'raise error on array append if array is a dict')
- a language-level definition of ABI and API
The last is a hard problem; the first two are extremely unlikely to ever be practical, unless Facebook stop needing to optimize HHVM for www performance and reliability.
I think "runtime backward compatibility" is more realistic than binary, because currently all inferred type parameters do not change the runtime behavior.
Reified type parameters do change the runtime behavior. Fortunately they cannot be inferred.
I'd go with variance rules when documenting signature changes. Non-final methods are invariant on their parameters, return type, and coeffects. All final or free standing functions are contravariant on their parameters, covariant on their return type, and contravariant on their coeffects.
Yes, there are indeed ways the typechecker can emit errors where it previously did not. These callsites can be restored by explicitly typing out the previously inferred types. If that type is not denotable, maybe mention it in the blog.
Yes, there are indeed ways the typechecker can emit errors where it previously did not. These callsites can be restored by explicitly typing out the previously inferred types. If that type is not denotable, maybe mention it in the blog.
It is trivial to fix errors like this in a monorepo if all the source files of all dependencies are in the same monorepo. However, it is not easy to fix an even trivial typing error in composer based projects where errors are commonly found in the vendor directory, especially when some dependencies are maintained by third parties.
Would you happen to be able to give an example where project B and project C are both independently compatible with hhvm version (next). Project A depends on B, which depends on C. The error only shows up in project A, but not in a source file controlled by A? From my understanding, if C typechecks and B typechecks, wouldn't all new type errors have their primary error location in A?
Edit: I misunderstood the example from the internal post. The issue is not, whose error is this to fix, but are there ways to allow incompatible changes to remain unfixed. If A doesn't know about the new signature, it can pretend to use the old one (provided they are compatible).
Yes. The problem is not only about changes introduced by HHVM, but also changes introduced by third party libraries, for users who don't have a monorepo.