Set implementation does not appropriately handle hash collisions
The set implementation in helper/schema/set.go produces incorrect results if elements in the set have colliding hash code values.
This should be relatively rare in practice because the size of the sets represented by this implementation would typically be relatively small, but the consequences of a collision would be very surprising to encounter.
The method used by, e.g., HashSet in Java is to also require an equality operation to be defined for elements in the set. I don't think that would be possible without changing the interface of TypeSet, so I'm not sure what the best path forward is.
Terraform Version
Terraform v0.6.15-dev (current master / 8cf13d9582309f45e4a04cd4cd36e717b5b60c75)
Affected Resource(s)
All resources using TypeSet.
Terraform Configuration Files
provider "aws" {
region = "us-east-1"
}
resource "aws_instance" "web" {
ami = "ami-408c7f28"
instance_type = "t1.micro"
tags {
Name = "HelloWorld"
}
// these security group IDs all have the same CRC32 hash code (1373619311)
// https://gist.github.com/mattmoyer/5565a1dd5795c0ff53daa8e73b06c37b
vpc_security_group_ids = [
"sg-8c0f398e",
"sg-3615fc73",
"sg-eaf01421",
]
}
Expected Behavior
The terraform plan output should show all three security group associations.
Actual Behavior
In this case, aws_instance uses the HashString helper which is a CRC32 checksum. The three security group IDs in this case were deliberately chosen as examples that have a colliding CRC32 value of 1373619311, so they collide and only one of them ends up in the terraform plan output:
[...]
+ aws_instance.web
ami: "" => "ami-408c7f28"
availability_zone: "" => "<computed>"
ebs_block_device.#: "" => "<computed>"
ephemeral_block_device.#: "" => "<computed>"
instance_state: "" => "<computed>"
instance_type: "" => "t1.micro"
key_name: "" => "<computed>"
placement_group: "" => "<computed>"
private_dns: "" => "<computed>"
private_ip: "" => "<computed>"
public_dns: "" => "<computed>"
public_ip: "" => "<computed>"
root_block_device.#: "" => "<computed>"
security_groups.#: "" => "<computed>"
source_dest_check: "" => "1"
subnet_id: "" => "<computed>"
tags.#: "" => "1"
tags.Name: "" => "HelloWorld"
tenancy: "" => "<computed>"
vpc_security_group_ids.#: "" => "1"
vpc_security_group_ids.1373619311: "" => "sg-eaf01421"
Plan: 1 to add, 0 to change, 0 to destroy.
A similar failure case exists for any other SchemaSetFunc implementation used with TypeSet, since they all output a 32 bit code that may have collisions.
Steps to Reproduce
- Copy the configuration pasted above into a
.tffile, and runterraform plan.
Hi @mattmoyer - you're right, this is particularly nasty if you happen to encounter it. I'll give it some thought with @phinze and @mitchellh and come back to this issue once we have a plan.
So I think the only thing we can do here is:
- For primitive types we can do equality automatically
- For non-primitive types we'd need a equality function provided.
I don't think there is any automatic way for us to determine equality for all set elements without that. Unfortunately, most set elements are not primitives...
@mitchellh we already have a bunch of code that generates a proprietary string serialization of complex structures for use in the default hashing implementation. For cases where the default hash implementation is being used (I think that everything that's been added since it's existed), could we just compare the strings that produces to get deep-equality?
Presuming that there aren't any issues with that which I'm not seeing, we could plan to shift all of the older stuff (carefully) onto the default set implementation, since AFAIK there's no good reason to override the hash function and plenty of confusing behavior if you don't get it right. (such was the motivation for adding the default implementation in the first place)
Thanks to @jvoorhis we have a simple repro of this that doesn't require creating any AWS resources:
data "archive_file" "icollide" {
type = "zip"
source {
content = ""
filename = "plumless"
}
source {
content = ""
filename = "buckeroo"
}
output_path = "buckeroo.zip"
}
Hi all! Sorry for the long silence here.
v0.12 includes some work towards a solution here, by switching to a new set implementation that is not subject to hash key collisions.
I've relabelled this issue as a provider-sdk one rather than a config one to reflect the fact that for the moment the provider SDK must unfortunately continue to use its own set implementation (converting from the new Terraform Core one on entry to the SDK) because the providers remain compatible with v0.11 and prior releases for the moment.
Fully addressing this problem will therefore require some revisions to the provider SDK which will come at a future point where we are comfortable ending support for Terraform v0.10 and v0.11 for new provider releases, which'll depend on how quickly the user community is able to adopt v0.12. We'll revisit this again later along with other work to introduce v0.12-specific features into the provider SDK.