ctftime.org icon indicating copy to clipboard operation
ctftime.org copied to clipboard

Ranking system

Open pedromigueladao opened this issue 6 years ago • 9 comments

I would like to go back to this topic and discuss the results of the current scoring system. Although the current system (2017) is much better than the one used in 2016, where one could just brute-force the ranking, I would like to point out some glitches in it that in my opinion can be easily fixed and honestly believe that can make it better.

TLDR: Main points are:

  • Current weight does not reflect CTF difficulty. Is every CTF worth 23-25 points? I guess not.
  • Weight of a CTF should be known in advance.
  • Score for organizing a CTF is to high.

Proposal:

  1. Have 3-4 brackets to rank CTFs, in particular the new ones. Eg, rank 5, 10, 25, 75.
  2. Have a small committee that ranks the CTFs up-front, and provides a promotion/demotion information for the year after.
  3. Reduce the organization score to Min(CTF-weight, Max_team_score).
  4. Use the team's 10-best scores to rank a team (or 10-top + N1-tier2 + N2-tier3 + N3-tier4 if one believes that participating in smaller/new ones should be encouraged)

What it does not solve:

  1. Grading a new CTF in a wrong bracket.
  2. Incorrectly assess next-year's quality of a CTF based on current year's info.

Rationale

  1. Ranking of new CTFs: almost every "new" CTF was ranked 23-25 points, whereas "old" CTFs were ranked whatever they were ranked the year before. It is my impression that unless someone was mad at something that happened during the CTF, the "regular participant" would by default vote MAX.

I believe that we could have a small committee of 8-10 teams that would rank the CTFs in 3-4 different tiers, say 5, 10, 25, 75 for new, new from known-organizers/established nice intro CTF, established and good CTF, established and outstanding CTF, respectively.

If CTFs are submitted in advance, say 3 weeks before contest, this committee could grade the CTFs and everyone would know upfront the weight of that CTF. Moreover, this could be auditable, and there could be a post-CTF period where these teams could rank "meet expectations", "failed-to-meet expectations", "exceeded expectations", and "not enough info".

If we have 8-10 volunteers, someone would almost for sure participate and could provide insightful feedback after the CTF. This would provide info if a CTF should be promoted or demoted for next year.

Of course this will be a course grading and subject to flaws, but at least one would not vote biased by own result, nor one would be caught by surprise by a CTF that was ranked MAX and where everyone solved all challs (I am taking as a premise that this does not happen in established CTFs)

  1. Although organization of CTFs is an important and most appreciated task by the community, scores for organizing CTFs are in my opinion excessive and in several cases account for a significant percentage of a team's score. I understand the rationale for rewarding a team that puts a lot of effort on organizing a CTF, and I am honestly thankful to them, but I believe that together with the inflation of the scoring system, the current scoring is just to high.

PS: did not provide examples in purpose as the goal is not to bash anyone's team

pedromigueladao avatar Dec 15 '17 20:12 pedromigueladao

Have a small committee that ranks the CTFs up-front, and provides a promotion/demotion information for the year after.

The issue here is basically How to grade a new CTF you don't know anything about?. This was the whole idea behind public voting -> admin was not comfortable with setting scores by hand any more.

Use the team's 10-best scores to rank a team

But that's already done!

Ranking of new CTFs: almost every "new" CTF was ranked 23-25 points, whereas "old" CTFs were ranked whatever they were ranked the year before. It is my impression that unless someone was mad at something that happened during the CTF, the "regular participant" would by default vote MAX.

Main reason for that is simply that we don't split the "difficulty" score from "quality" score, and this causes confusion. People tend to vote by "quality", so downvote if there were technical issues, downtime, lack of IRC/admins, and vote max if no such problems arose. But difficulty of the tasks is often not taken into consideration at all. It's more of a "fun factor" score - you had fun? Put max! It would already improve the situation a lot if the score was split into "difficulty" and "quality" parts ;)

The idea of splitting events into "tiers" was already raised last year and I think it's actually a rather good idea.

Pharisaeus avatar Dec 18 '17 15:12 Pharisaeus

@Pharisaeus indeed it is not simple to rank a CTF a priori and I understand the reason why admin started the voting process. But having a committee of 10-15 would not be good enough approximation for distributed-blame?

as for 10-best/year, it is the current situation. Just wanted to put up the whole proposal.

pedromigueladao avatar Dec 26 '17 16:12 pedromigueladao

Hey folks. I'm new to this community. I read all comments in #40, so I have some background about last year's discussions. I have some ideas about ranking. What I would like to propose: let's use normal distribution.

My idea about ranking:

  • Only teams taking part in CTF can vote for it -- they know what they vote for.
  • Team (as whole, one vote) decides to cast points, after finishing CTF (alternative version: all votes from team are summed and divided by number of voters)
  • All teams votes are taken into account, although they are applied against normal distribution.

Normal distribution protects from extremum votes. What it means:

  • winning team, giving maximum number of points, won't be taken into final results
  • losing team, trying to lower value of CTF -- the same scenario. There is less chance to "game" weights.

There is no way to decide a priori, what is "the weight" of new CTFs. Although, after finishing CTF, there is possibility to add some weights to it. Also, this value can be used for next year's occurrence.

How about recurring CTFs? It works pretty much the same, with slight change. We already have weight from past year, so we know how this CTF worked in the past. It gives also an answer, what could be expected in current year's occurrence. But, we still need to gather votes for CTF, after it is ended. Normal distribution works again, but this time, we can take last year's number, add it to current value and divide by 2 (or take average value from all editions). If organizers did better job, CTF is graded better (which also applies to next year). If it was worse, grade is slightly worse.

Thoughts?

dasm avatar Apr 02 '18 04:04 dasm

Hi @dasm , Do not really understand what you mean by normal distribution, but just look for instance at all CTFs that occurred in March 2018 and see what "normalization" would do to them. Also, if you played them, think what would be your fair assessment in terms of scoring, and compare to the outcome of the voting process. I believe that as @Pharisaeus has mentioned several times people tend to score fun-effect, rather than difficulty. IMHO a posteriori voting is not the way to go. And, if you consider that it is not ok/it is unfair to project a ranking for a CTF based on its past, ie a priori weighting, see what is the status right now. For new CTFs you vote, but for existing ones, you take the ranking from previous year. This is in fact a priori weighting a CTF.

We are just following a path of hyper-inflation of CTF scoring with marginal differences among CTFs.

Me and others have pushed the "tiers" scoring (and a priori weighting proposal) but it has not gained enough momentum to go fwd.

PS: if you look at the list of upcoming CTFs, with multiple CTFs per weekend, there is another game to be played that is to define a strategy to maximize your team's ranking

pedromigueladao avatar Apr 02 '18 09:04 pedromigueladao

Hey @pedromigueladao. Sorry for delayed response. Let me explain what I meant by "normal distribution". By "normal distribution" I thought about taking majority of the votes casted on CTF, using probability density function [1] for standard distribution [2] and taking its mode [3]. I know it could create other problems. If CTF was attended by limited number of teams, voting can be skewed. Let's assume, that for some reason, even if CTF wasn't good, all the teams gave top notes to it. Then, this particular CTF would be graded as one of the best in the year. All multipliers would go through the roof, and teams playing on this CTF, would receive max values. Next year, because aforementioned CTF was so well perceived, more teams would participate... And if it wasn't good meet, votes would change, causing its degradation. It is autoregulated way of counting votes for it. It doesn't require committee. No elections are needed.

Instead of "normal distribution" it could be counted like: ignore top/bottom 20% of votes, take 60% and get mean value of these.

I agree that there is no one way of scoring difficulty vs fun-effect. But still, if someone enjoyed CTF, it could be voted higher than difficult CTF with lack of fun-factor.

The biggest issue that you've mentioned, and I didn't think about: too many CTFs. Idea: count weight for meets, where at least (x) teams participated.

[1] https://en.wikipedia.org/wiki/Probability_density_function [2] https://en.wikipedia.org/wiki/Normal_distribution#Standard_normal_distribution [3] https://en.wikipedia.org/wiki/Mode_(statistics)

dasm avatar Apr 04 '18 00:04 dasm

@dasm it won't work simply because statistics is applicable only if enough teams vote, and they vote reasonably. The tendency is different. Many CTFs don't get almost any votes, especially on-site finals where only a handful teams participate. Others get upvotes from noname teams, who happened to score high, regardless of level of the competition. Maybe if you were forced to vote, for example to "claim" points on CTFtime, then some people would be encouraged to vote. But again, majority of people would vote max or min, depending if they "liked it".

Basically right now 25 is the new 0, when looking at some low-tier CTFs. And there is no-one to "offset" the score, because top teams don't even bother playing there, and "weak" teams tend to vote high easy events, because they managed to solve something.

Not to mention that many teams can't really do relative scoring, because a 50p CTF is already too hard for them, and they won't see a difference between 50 and 100p. So either they will vote max because "it was super hard", or they will vote 1 because "I failed so I didn't like it".

I know it's not so nice, but I honestly think either CTFs should be split into tiers by admins/top teams, as someone suggested a while ago, or only teams from let's say top 50 should have the power to vote. Otherwise we're facing points inflation on one side, and flattening scores on the other.

Pharisaeus avatar Apr 04 '18 07:04 Pharisaeus

Totally agree with @Pharisaeus . Looking at current voting system, not voting 25 (or 1 that usually is due to some technical issue rather than the challenges themselves) is considered an outlier. My proposal while ago to have CTFs ranked in tiers was to simplify the a priori grading. To not waste time discussing 20, 25, 30 or 35. ATP ranking (Tennis) has been like this for ages https://en.wikipedia.org/wiki/ATP_Rankings and it seems to work. Also, having a rolling system (=last 12 months count) rather than reset on January 1st would be interesting.

And @Pharisaeus does not need to be a committee of top-teams. It just need to be a committee of accountable teams. The non-committee teams could provide feedback after each CTF that could be taken into consideration when deciding next year's bracketing.

pedromigueladao avatar Apr 04 '18 11:04 pedromigueladao

does not need to be a committee of top-teams

As I pointed out above, the issue is with relative scoring between CTFs. If you can't solve majority of tasks on some CTFs, you can't really compare the difficulty, because all will be considered "hard". There is also the fact that in order to have a consistent scoring between events, you need to play a lot of the CTFs. This implies active, "strong" teams, which is why I mentioned top 50. Also I think strong teams tend to score difficulty/level of challenges, and not if they "liked it".

There was an issue in the past of some top teams "gaming" the system with voting, but right now with votes going for the next year, and only best 10 scores taken into account, I think it's no longer a serious issue.

Pharisaeus avatar Apr 04 '18 11:04 Pharisaeus

@Pharisaeus yeah. I forgot about the most important: there are not enough playing teams to have enough votes.

dasm avatar Apr 10 '18 00:04 dasm