Change to the way teams are ranked and matched in Champions Ladder.

Introduction

Currently teams are ranked according to their win rate and games played. They are matched according to their team value and a TV+ value that is hidden from the player but is higher for teams with a better record.

The proposal is to rank teams using a visible points based rating system and to switch the matchmaking system to using team value only to select matches.

How would the rating system work?

  1. Each team would start with a base level of rating points, for example this could be 100pts.
  2. The teams would then gain or lose some of their rating points when they play matches.
  3. The amount of rating that is won or lost is not constant but is calculated based on which team is expected to win the matchup. The chance to win a matchup would be calculated by comparing each team's rating and also adjusting for any team value differential.

What are the advantages to using this system compared to the current system for ranking?

Currently teams are ranked based on their win record, but this ranking system fails to take into account the quality of opposition faced, so the integrity of the competition is reliant on the matchmaking system being able to provide matches which test the teams fairly.

In theory, using TV+ in the matchmaking will result in teams that have strong records playing against each other more often or alternatively teams with strong records will find themselves with a TV disadvantage that they will need to overcome in order to prove their skill.

This relies on a large player population and requires a decent sized pool of developed teams. In Champions Ladder the number of teams queuing for a game can be very low and many of the teams are poorly developed, particularly at the beginning of each season. This results in a lot of bad matchups being created, yet the ranking is unaffected by the quality of matchup.

Under a points based system, a bad matchup will award much fewer points to the winner, assuming the stronger team actually wins, while the loser will lose fewer points. This means that the ranking system no longer requires good matchups to be found in order to maintain its integrity.

The current ranking system heavily encourages players who care about their record to remake teams repeatedly if they suffer a defeat in their first few matches. With a points rating system, these early games wouldn't be significant to a team's overall chances of qualification so while winning would still be advantageous, losing would not necessitate remaking a team. Similarly a team could never reach a point where its record is irrecoverably bad. This would encourage people to stick with teams more commonly than they do.

What are the advantages to using this system compared to the current system for matchmaking?

Matchmaking by team value is a more logical form of matchmaking as team value is supposed to be a measure of how powerful a team is. Of course the actual outcome of a match is determined by the strength of the teams and also the skill of their coaches (and the die rolls).

I believe it is better for games of bloodbowl to be played with teams of equal value as far as this is possible, even if the skill of the coaches is different.

One aspect where this is better is attrition. A team is currently judged to be weaker by their record, however attrition is not tied to record but to blocks, especially blocks made with "killer" skills like Mighty Blow, Claw and Piling On. A coach may be poor at winning games of bloodbowl but be capable of making plenty of blocks with Mighty Blow. This can lead to matchups where the result is a 50/50 but the team with a stronger record and a team value disadvantage is likely to take many casualties.

If teams were matched only on team value then these frustrating attrition mismatches would occur far less often and weaker players wouldn't be rewarded for playing a brainless blocking game as much as they currently are.

Summary and points for possible further discussion

In summary I think these changes would be better because they:
-Would lessen the impact of early season matches
-Would lessen the impact of the matchmaker
-Would encourage people to keep teams, even after some lost games
-Would lessen the chance of receiving a cascade of casualties due to playing down TV
-Would make bash teams slightly less attractive
The exact mathematics behind the ratings would have to be designed.
I also feel like a "system within a system" of matching by rating within a much narrower team value margin would be good. For example a 1430 team with 210 points would match a 1470 team with 205 points instead of a 1440 team with 90 points.

BB2 Champion Ladder Admin Team

Looking at opposition quality as part of ranking is something I am looking at currently. What you're describing is basically an Elo-type system, and that is what I am looking at.

@woofbark said in Change to the way teams are ranked and matched in Champions Ladder.:

Each team would start with a base level of rating points, for example this could be 100pts.
The teams would then gain or lose some of their rating points when they play matches.
The amount of rating that is won or lost is not constant but is calculated based on which team is expected to win the matchup. The chance to win a matchup would be calculated by comparing each team's rating and also adjusting for any team value differential.

What you're describing is, more or less, ELO. While there are an infinite number of ELO setups based on screwing around with the internal variables, the basic premise of ELO is that the rating difference lets you predict who will win a given match, and based on the presumed accuracy of that, your rating will change based on whether you win the match, and how likely the past ratings predicted you would.

Now, this is great on paper, but ELO is a performance metric which, as the name suggests, requires performance in order to become an accurate measure of player (or team) ability. This means it relies on there being a sufficient number of matches played by any given participant to create a stable rating. The median number of games played by a team in BB2 is something painfully low... 5 or less, I believe. This has routinely been the case in open matchmaking for BB.

@woofbark said in Change to the way teams are ranked and matched in Champions Ladder.:

This relies on a large player population and requires a decent sized pool of developed teams. In Champions Ladder the number of teams queuing for a game can be very low and many of the teams are poorly developed, particularly at the beginning of each season. This results in a lot of bad matchups being created, yet the ranking is unaffected by the quality of matchup.

Small pooling sizes will always push things toward suboptimal matchups, regardless of which system is used. The question becomes "how well is it working within the context of the existing environment", and we've seen from the data that TVPlus matchmaking generated a pretty significant improvement regardless of the pooling limitations.

Under the type of system you're talking about, in the situations you're talking about (early season, low development) you run into exactly the same thing - with low amounts of information available to the system, it will be applying fairly uniform rating changes across the board simply because it has no additional info about either team to work with. Even the potential benefits are unlikely to be seen until later in the season, and primarily geared toward teams that start late and end up facing well-developed teams.

@woofbark said in Change to the way teams are ranked and matched in Champions Ladder.:

The current ranking system heavily encourages players who care about their record to remake teams repeatedly if they suffer a defeat in their first few matches. With a points rating system, these early games wouldn't be significant to a team's overall chances of qualification so while winning would still be advantageous, losing would not necessitate remaking a team. Similarly a team could never reach a point where its record is irrecoverably bad. This would encourage people to stick with teams more commonly than they do.

I think you're incorrect here. If you lose some games you're no less "down" under an ELO style system than under any others. If other people can go on long win streaks, their ELO rating will still be perpetually higher than yours. If other people can't maintain those streaks then it doesn't matter when you lose your games, only that you lose less. Until there is a wide variety of different levels of achievement in the environment, the changes in rating will not be significantly different, making such a system vulnerable to exactly the same things the current system is.

I think your logic relies on the idea that people are playing totally independent matches, as they would with chess... but they aren't. There is mechanical development involved, and you very much can hit a point where you are irrevocably down too far to qualify. Likewise, the complaint of late-starter teams is not that they can't qualify, its that existing teams stomp them into the dust. The only way an ELO style system helps with that is if there's a large enough playerbase to create sufficiently large pool sizes to give better matches... and if there were, any of the systems would handle that problem already.

@woofbark said in Change to the way teams are ranked and matched in Champions Ladder.:

I believe it is better for games of bloodbowl to be played with teams of equal value as far as this is possible, even if the skill of the coaches is different.

You'll have to qualify the term "better", since we know for a fact that TV based MM results in less balanced matches. We also know that TV is not a particularly good measure of a team's mechanical strength.

What you really need to do, rather than proposing abstract ideas, is to take the data and create a concrete formula for the ranking you're proposing, and use the data to show that there is reason to believe it accomplishes what you say it will in the context of the actual environment.

I think you're going to find what we found long ago... that while ELO sounds brilliant, it doesn't work all that well in practice for Blood Bowl. You can theorize there's a good way to incorporate mechanical strength into the mix, but theorizing doesn't even buy us a cup of coffee - find the way to mix them together and show that it creates a better ranking system and voila, you've got yourself a real case for change!

@voodoomike said in Change to the way teams are ranked and matched in Champions Ladder.:

Now, this is great on paper, but ELO is a performance metric which, as the name suggests, requires performance in order to become an accurate measure of player (or team) ability. This means it relies on there being a sufficient number of matches played by any given participant to create a stable rating. The median number of games played by a team in BB2 is something painfully low... 5 or less, I believe. This has routinely been the case in open matchmaking for BB.

Just to ask the obvious:

Why tie the ELO to a team or even a season? Wouldn't it be more natural/promising to tie it to the coach in the CCL environment? Of course, the amount of accumulated data (and therefore predictive quality) would still be low for new coaches to CCL (so probably an average-ELO should be assumed for new coaches), but it would be high for long-playing ones.

The original/starting ELO assigned to each coach, once switching to such a system, could probably even be calculated from past win-records of past seasons. That would even be more fair because it would include the records of all throw-away teams, as well, which, in turn, could encourage less throwing away of teams as it doesn't help you as much as it does now.

BB2 Champion Ladder Admin Team

@ugh said in Change to the way teams are ranked and matched in Champions Ladder.:

Wouldn't it be more natural/promising to tie it to the coach in the CCL environment?

If we are to take teams through to a final (and people seem to want to do that) then ranking the teams which are to qualify is natural. I have created a system which ranks coaches overall, or by race, but unless you throw away the team and have an entirely separate competition for the cup there's little meaning to doing so.

@dode74 said in Change to the way teams are ranked and matched in Champions Ladder.:

If we are to take teams through to a final (and people seem to want to do that) then ranking the teams which are to qualify is natural. I have created a system which ranks coaches overall, or by race, but unless you throw away the team and have an entirely separate competition for the cup there's little meaning to doing so.

Erm, can you tell me why? What does tying it to a team achieve?

Obviously, the record of the team should still have a (large) influence on the qualifying part, but the ELO would be another influence both on the match-making (hopefully generating more fair matches) and reducing the influence of still-occurring-but-unavoidable bad match-making (between coaches) on both player's records. Good coaches would have to either win more against other good coaches (rare occurrence due to low pool-sizes) or more against bad coaches than they do now (regular occurrence for good coaches) to rise to the top of the list. Basically, the luck-part of match-making could be reduced. I don't see the relation to a specific team being used at all.

That doesn't mean it's not there, but I don't see it, yet.

Tying it to a team has the obvious drawback of the low number to base the ELO on, so while I don't see an advantage, there's a clear disadvantage.

BB2 Champion Ladder Admin Team

Having something outside what the team has achieved influence the ranking of that team (even indirectly through matching) makes the ranking gameable. Imagine the player who wants to take his Chaos team through using a Halfling team to shed matching-Elo by losing matches with it, thus giving him easier matches with the Chaos team.

@dode74 said in Change to the way teams are ranked and matched in Champions Ladder.:

Having something outside what the team has achieved influence the ranking of that team (even indirectly through matching) makes the ranking gameable. Imagine the player who wants to take his Chaos team through using a Halfling team to shed matching-Elo by losing matches with it, thus giving him easier matches with the Chaos team.

Fair point.

How about computing the overall-starting-ELO for a season by looking only at the best-performing team of the coach from previous seasons and the current-season-ELO on additional performance with the team that is being matched/judged for qualification. To keep your overall-ELO low while still qualifying regularly, you'd have to play a whole season badly with all your teams.

Also, race-matchups would have to considered for computing the ELO anyway, as I think it is known that some teams are harder than others to play, so losing with a Halfling team would have little negative impact on the ELO while winning with it would have a big impact (while the reverse would be true with Necros). Thus, keeping your overall ELO low would be more time-consuming using Halflings than using Necros and losing purposefully with Necros is more obvious.

Thus, using the overall-best-teams-ELO and comparing it with the current-seasonal/team-ELO could give an indication of consistency of a coach and whether they are trying to cheat in such a manner.

BB2 Champion Ladder Admin Team

To keep your overall-ELO low while still qualifying regularly, you'd have to play a whole season badly with all your teams.

People would do just that. I see no reason to give people an incentive to lose.

using the overall-best-teams-ELO and comparing it with the current-seasonal/team-ELO could give an indication of consistency of a coach and whether they are trying to cheat in such a manner.

None of which is any use when we're qualifying teams, not coaches.

@ugh said in Change to the way teams are ranked and matched in Champions Ladder.:

Also, race-matchups would have to considered for computing the ELO anyway, as I think it is known that some teams are harder than others to play, so losing with a Halfling team would have little negative impact on the ELO while winning with it would have a big impact (while the reverse would be true with Necros).

You mean, differences in win rates based on which roster you play and which roster you play against... which you have already been told gets filtered out during regression analysis for being an insignificant contributor to outcome prediction? You can disbelieve it as hard as you want, but it won't make those numbers relevant in the real world. You can verify these facts yourself if you ever get out of your arm chair and trade in your hand-waving for number crunching.

Elo systems are predicated on the idea that everything you don't want to give the player credit for is already balanced such that the performance rating can fully be attributed to the player. That's fine when what we're ranking is a team - we're looking at its performance, which is a combination of team's mechanical ability and coach's skill... but it takes time to find a stable rating, and the speed at which that is going to happen will depend on how stable the ratings of the other teams are. With the mean lifespan of teams in both COL and CCL being less than 10 games, that's not only unlikely, but also demonstrably crippling to any use of Elo (as people have seen when they try to apply it to the data).

Trying to use Elo on coaches, across multiple teams, is incompatible with the design of the game because there does not exist mechanical balance across rosters or individual teams... and thus, the rating is not going to be the coach's skill - it will be heavily influenced by that coach's choice of teams he plays. If he wanted a high rating he'd play a bunch of high performance rosters... and if we were using the coach's Elo rating for CCL, there'd be little reason for there to be seasons at all because the games played during the seasons would have less and less effect on the rating we were using to judge the final outcome of the season's play.

Here's a crazy idea - why don't you stop posting "what if" and "how about" over and over and go grab the data and investigate these concepts directly? It'd be a much better use of your time than pretending your criticisms of people's grammar on the Steam forums is philosophical debate.

I didn't specifically mention Elo in my idea because I don't know if I'd class it as an Elo system. Elo as I understand it is a Chess ranking system, and Chess is a very different game although there are similarities. Any change to a system has to have objectives and my objectives were to make the rankings a better reflection of how well a team had performed.

The basic premise being that different matchups are harder or easier and it isn't fair to rank teams purely on their win loss record as that leads to teams that qualify being those that drew the weakest opposition. Essentially turning qualification into a glorified lottery.

I wanted wins against tough opponents or overcoming a big TV gap to be more rewarding.

BB2 Champion Ladder Admin Team

I wanted wins against tough opponents or overcoming a big TV gap to be more rewarding.

Yes, that's exactly what I've been looking at. The issue I have is that the current ranking system is, contrary to expectations, proving to be a better indicator of who we think should win a match than anything we've come up with so far.

@woofbark said in Change to the way teams are ranked and matched in Champions Ladder.:

I didn't specifically mention Elo in my idea because I don't know if I'd class it as an Elo system. Elo as I understand it is a Chess ranking system, and Chess is a very different game although there are similarities. Any change to a system has to have objectives and my objectives were to make the rankings a better reflection of how well a team had performed.

We would classify any system that alters your ranking metric by a variable amount based on your predicted outcome to be an Elo style system. Keep in mind that you haven't proposed a different system, you've proposed a different abstract concept for a system without any specifics. To date, no system of that sort has been found to be a more accurate ranking system than what we already have in place. If you want to try it for yourself then by all means give it a go.. as I said to crash-helmet, the data for every season of CCL is available from mordrek's goblinSpy site.

@woofbark said in Change to the way teams are ranked and matched in Champions Ladder.:

The basic premise being that different matchups are harder or easier and it isn't fair to rank teams purely on their win loss record as that leads to teams that qualify being those that drew the weakest opposition. Essentially turning qualification into a glorified lottery.

Nobody is confused by the premise, and you don't need to further "clarify" it... we all get it just fine. The problem is that the underlying assumptions have proven to be incorrect based on the data. That's why I say elo-style systems look great on paper, but just because they make sense to us... and we feel they should work better... they have not actually been better when tested using actual data.

Dode has tried several different ways to use match-prediction metrics to develop an elo-style ranking metric because, as I say, conceptually it makes plenty of sense... but the post-hoc predictive power of such ratings ends up being inferior to scaled win%, which is what we currently use. Even though it feels like differential rating rewards for different projected difficulties in winning should produce a more accurate ranking, it just doesn't seem to do that in CCL.

The fact that the current system is remarkably good with post-hoc prediction tells us that the current system is not "a glorified lottery"... it's actually pretty damned good for the environment it's being used in.

How do you go about deciding if a rating system is good or not @VoodooMike ?

@woofbark said in Change to the way teams are ranked and matched in Champions Ladder.:

How do you go about deciding if a rating system is good or not @VoodooMike ?

Our present method is to assess the rating's post-hoc predictive power. What this means is we take the data for an entire season and take the final rating that each team achieved using a given rating system... and we look at every match played by every team, comparing their final ratings to see which team those ratings would predict to win that match (higher rating being the predicted winner). A given ranking metric will predict a certain percentage of matches that resulted in a victory.

The present system generally has around an 85% success rate for post-hoc prediction. Nobody has found a better system than that yet, and we throw a lot of alternatives at the data... and no few of those are "elo style" systems that base rating changes on a sliding scale based on difficulty of opponent.

Are there other ways we might assess a ranking metric? Absolutely... but the one we're using is the most obvious and easy to justify - it represents a measure of the accuracy of the final ranking scores for teams and thus, assesses how good the metric is at determining who was "the best" during a season.

Isn't it just intrinsic to the way champs ladder works that you would get those high rates of success for prediction? A team with a high rating is obviously going to have won most of its matches because that is quite literally how that rating was calculated.

Aren't you just saying that teams with better records, win more often than teams with losing records?

@woofbark said in Change to the way teams are ranked and matched in Champions Ladder.:

Isn't it just intrinsic to the way champs ladder works that you would get those high rates of success for prediction? A team with a high rating is obviously going to have won most of its matches because that is quite literally how that rating was calculated.

Absolutely. While your tone may be incredulous, it shouldn't surprise you that the final ranking in the competitive ladder favours teams that won more games than other teams. The ranking metric is, in fact, win% adjusted by games played in order to minimize the chances of getting a "lucky streak" of games that sets you at a high win% early on, and just sitting on that.

@woofbark said in Change to the way teams are ranked and matched in Champions Ladder.:

Aren't you just saying that teams with better records, win more often than teams with losing records?

Yes, with the additional bit favouring those who can maintain their record for longer (longer meaning more games played, as each game has the potential to lower your win rate).

As I explained above, if we control for the relationship between starting date and games played (people who start earlier have more time to play more games) we find there is NO statistically significant relationship between the day you start your team and your final rating. This means there's no evidence in the data that people who start their teams later have a harder time achieving a high ranking so long as they can play enough games in the time they have left in the season.

Right from the start of CCL, promoting the playing of more games has been a design intent, so I don't imagine they'll be interested in removing that aspect. With there being no evidence that latecomers are actually facing a harder time achieving a high ranking assuming similar number of games played, I'm not sure there's much reason to mess with what's in place.

If you have a non-abstract idea on alternative measurements to look at in the data then I'm sure we're all ears. I'm always looking at different facets of the data and sometimes finding new things in it... but right now I haven't seen anything that supports changing the ranking system.

I really like your suggestion. Anything, and I mean anything is better than the current system.

last edited by Hotdogchef
BB2 Champion Ladder Admin Team

@hotdogchef said in Change to the way teams are ranked and matched in Champions Ladder.:

I really like your suggestion. Anything, and I mean anything is better than the current system.

Actually the current system is objectively the best measure to have been found yet. And I have been looking through a LOT of different methods, including the Elo-style systems the OP suggests. It's counter-intuitive, but it's true. I'm still looking at methods, though, because a system which is at least as good and is more intuitively satisfying would be a step up, imo.

@dode74 said in Change to the way teams are ranked and matched in Champions Ladder.:

I'm still looking at methods, though, because a system which is at least as good and is more intuitively satisfying would be a step up, imo.

I think the current system is about as intuitive as you can get for the average coach. People understand win% as a measure of achievement for a team, and most people can understand that the metric needs to be tempered by the number of games played to ensure reliability.

Right now we only have abstract objections to the current system... meaning, we don't have any solid metrics for measuring any flaws that might exist in the current system. It's probably a good idea to develop such metrics prior to attempting to develop systems that correct for them, since there's no way to tell if new systems do.

I suspect it will be very difficult, if not impossible, to create a non-win% based system that has better post-hoc predictive power than a win% based system specifically because win% systems are straight-up based on win%s, and we're measuring them by looking back at what % of wins they predict. Obviously its deeper than that (since we're comparing final ratings for each match, not just seeing what % of matches a person won vs. their final win% which would be a goofy bit of identity math) but still, the relationship between the system and the math we use to assess the system is there. Even if you find something that has better predictive power than the current system I suspect we can then beat it by adjusting the current system's ramp-up math.

Looks like your connection to Focus Home Interactive - Official Forums was lost, please wait while we try to reconnect.