problem-specifications
problem-specifications copied to clipboard
Make descriptions unique
Recently, there was some discussion on whether test case descriptions should be unique: https://github.com/exercism/problem-specifications/pull/2019#discussion_r855117856 IMO there is an unwritten rule that each description should be unique, but it is not documented anywhere I think. To me, it seems like having unique descriptions is useful for several reasons:
- It makes for easier identification of test cases (descriptions are easier to scan than UUIDs)
- It is often a prerequisite for test generators that a (full) description is unique
To be clear, when I'm referring to a description being unique, what I mean is that the concatenated description of a test case must be unique. In other words, the uniqueness also takes into account any description of parent test case goupings.
@wolf99 Would something like this be possible to encode in the JSON schema we have?
This clashes with the actually documented purpose of descriptions:
Descriptions should not simply explain what each case is (that is redundant information) but also why each case is there. For example, what kinds of implementation mistakes might this case help us find?
There are plenty of scenarios where more than one case has the same reason for existing.
It is often a prerequisite for test generators that a (full) description is unique
I don't think this should be considered. They can break test generators in many ways, e.g. if the test generator can't handle certain characters, numbers etc. and they're also mutable. Any kind of restriction for descriptions based on the needs of generators comes at the cost of the main audience of the canonical data: humans reading and interpreting it.
If uniqueness is required, the UUID or some kind of hash of the test case can trivially be used already by the generators to achieve it.
I don't think this should be considered. They can break test generators in many ways, e.g. if the test generator can't handle certain characters, numbers etc. and they're also mutable. Any kind of restriction for descriptions based on the needs of generators comes at the cost of the main audience of the canonical data: humans reading and interpreting it.
I think this should be considered. Most generators (or even humans copy-pasting like I did a bunch of times) will use the description as a test name, it's a natural fit, so why wouldn't we make the job easier by having a few person doing the thinking and interpreting here rather than forcing 50 people to do it down the line?
If the documented purpose of description clashes with what people use them for, I think the documentation should adapt.
I can inform you that according to the below, all instances of duplicate descriptions fall under one of the three categories:
- although the description by itself repeats another, the cases in question are within different groups, and those groups have different descriptions (
-sflag) - although the description repeats another, all but one of the cases in question are reimplemented (
-rflag). - state-of-tic-tac-toe
require 'json'
def cases(h)
# top-level won't have a description, but all lower levels will.
# therefore we have to unconditionally flat_map cases on the top level.
# on lower levels we either flat_map cases or just return individuals.
cases = ->(hh, path = [].freeze) {
hh['cases']&.flat_map { |c|
cases[c, (path + [hh['description'].freeze]).freeze]
} || [hh.merge(path: path)]
}
h['cases'].flat_map(&cases)
end
def mark_reimplemented(cases)
reimpls = cases.map { |c| c['reimplements'] }
cases.each { |c| c[:reimplemented] = reimpls.include?(c['uuid']) }
end
reimpl = ARGV.delete('-r')
short = ARGV.delete('-s')
maxes = []
freq = Hash.new(0)
Dir.glob("#{__dir__}/exercises/*/canonical-data.json") { |f|
exercise = File.basename(File.dirname(f))
descs = mark_reimplemented(cases(JSON.parse(File.read(f)))).filter_map { |c|
next if c.fetch(:reimplemented) && !reimpl
arbitrary = ?/
((short ? [] : c[:path]) + [c['description']]).join(arbitrary)
}
descs.tally.each { |k, v| puts "#{f} #{k} #{v}" if v > 1 }
}
I have absolutely no stake in whether duplicate descriptions should be allowed and my sincere opinion is that I'll be able to deal with it either way, so I won't be weighing in on that; that will have to be left to the interested parties who do have a stake in it.
Hi @ErikSchierboom TBH I'm still figuring out exactly how far I can push the "implies" features of the JSON schema spec with regard to the uniqueness requirements. Your recent comment on the availability (or lack thereof) of JSON schema handling in Nim has given me something to investigate a bit before continuing much more with the schema also. While there is still some benefit to having a schema, it is quite a reduced benefit if the tool cannot use it.
Okay, thanks!