@dode74 said in BB2 2018 - A List of Game Balance and Team Diversity suggestions:

It's been tried. It's too good.

"Too good" as determined by the same metric that built the rest of the system, which is the BBRC's eyeballing, correct? I'd be interested to know the *actual* effect as opposed to what a couple of people felt about it.

@kaintxu said in Coping with this game:

Those are my rolls for the last game, my opponents were also quite bad, but he did not have a single turn over over double 1´s

The last two games recorded on goblinSpy at the moment are concession matches for you. So here's the analyses on the last non-concession match that is available for you via goblinSpy:

## (Aurrezky Dantzarys)

d6 rolls: n = 52, χ2 = 4.54, p = 0.4748

d6 ac1: r = 0.1546, p = 0.1350

d6 mean: 3.2692

d6 mean t = -0.9508, p = 0.3462

Block rolls: n = 16, χ2 = 1.25, p = 0.8698

## (Anotha One)

d6 rolls: n = 76, χ2 = 4.37, p = 0.4977

d6 ac1: r = 0.1169, p = 0.1558

d6 mean: 3.6053

d6 mean t = 0.5045, p = 0.6154

Block rolls: n = 44, χ2 = 0.66, p = 0.9563

## Total Match

d6 rolls: n = 128, χ2 = 4.19, p = 0.5227

d6 ac1: r = 0.0972, p = 0.1367

d6 mean: 3.4688

d6 mean t = -0.1973, p = 0.8439

Block rolls: n = 60, χ2 = 0.35, p = 0.9864

Nothing out of the ordinary there. Even if we ignored family-wise error problems (which we shouldn't) we still don't see anything reaching the level of statistical significance. Not a single p value is below 0.05, much less the 0.004 we'd use to maintain the guaranteed of only a 5% false positive rate across the whole battery of tests.

@kaintxu said in Coping with this game:

Is not about complaining about my bad luck, it´s about the game

No, you *think* its about the game, but analysis shows it's not.

@kaintxu said in Coping with this game:

Mike all that sounds goo, but does not take the context of how the plays were made. Double 1´s happen to often, which pretty much breaks everything for agility teams which has been my point from the beginning.

The chi-square test looks at the distribution of the rolls to see how far off they are from the expected perfect distribution. This means that, for example, if we rolled the d6 60 times during a game our expectation is that we'll see roughly 10 of each dice value. Randomness rarely gives us *exactly* that, so the chi-square goodness of fit test looks at how far off from that expectation the observed values are.

If you were getting an abnormal amount of double 1s, we'd expect that to show up in that test as there being an abnormal distribution of values... unless its giving a commensurate amount of additional 2's, 3's, 4's, 5's, and 6's. At that point the "dice cheating" is getting pretty complex, since it's having to keep track of the number of times it alters the value it gives you, and compensate with other values in order to avoid being noticed in statistical analysis.

Even if it did that (tinfoil hats on) we look at the pattern of the values using lag-1 autocorrelation (ac1 in the results) which is basic signal analysis meant to differentiate between random noise and numbers that have a non-random pattern to them.

Likewise, we look at the mean value of the dice rolls (which have an "expected" value of 3.5) and see how far off that expected value the mean value is.. again, if it was throwing you more 1's than anything else, that would push your mean d6 value downward, and a t-test of the observed values against the expected value would almost certainly show abnormality... unless, again, its keeping track and compensating with enough > 1 values to push the mean back into expected ranges while simultaneously tracking which numbers it uses in order to avoid running afoul of the chi-square test on the number of each value we see.

@kaintxu said in Coping with this game:

I also commented that the roll number are usually the same, but my doubt comes in how has people tested how often double ones, or quadruple ones come up?

The ac1 test I mentioned would notice if there were an abnormal amount of sequential values.

@kaintxu said in Coping with this game:

Cyanide, you are fucking retards, I Challenge you to come and see this games and tell me the rolls are normal

While I may not be Cyanide, I'm happy to run your replay through a quick battery of analyses:

## (Aurrezky Dantzarys)

d6 rolls: n = 153, χ2 = 5.63, p = 0.3442

d6 ac1: r = 0.0188, p = 0.4080

d6 mean: 3.4575

d6 mean t = -0.3192, p = 0.7500

Block rolls: n = 64, χ2 = 2.52, p = 0.6418

## (Fraugs)

d6 rolls: n = 180, χ2 = 3.73, p = 0.5884

d6 ac1: r = -0.0361, p = 0.3143

d6 mean: 3.4667

d6 mean t = -0.2679, p = 0.7891

**Block rolls: n = 131, χ2 = 10.92, p = 0.0275**

## Total Match

d6 rolls: n = 333, χ2 = 2.66, p = 0.7526

d6 ac1: r = 0.0116, p = 0.4162

d6 mean: 3.4625

d6 mean t = -0.4136, p = 0.6795

**Block rolls: n = 195, χ2 = 11.80, p = 0.0189**

The bolded lines are the only ones that even touch upon being abnormal, and even then it's only if we're ignoring family-wise error. If we want to stay within 95% CI across the entire battery of tests the lowest p value would need to be below 0.004, which it isn't.

Most importantly, though, you're not even complaining about block dice, you're complaining about d6 results, and those show as absolutely within normal expectations as far as value distributions go. You're not getting "way more 1s than 6s" or anything of that sort.

The easiest way to explain the tests is to look at the p value, and think of it as a percentage (1 being 100%, so.. 0.5 being 50% and so on) of games where we expect to see a *less normal distribution of values* than what we see in this one. So, if a p value is 0.75 it means that 75% of all matches can be expected to be LESS normal than this one.

@ugh said in Matchmaking apparently doesn't take TV or TV+ difference into account.:

For each tuple in the cartesian product, it is a quadratic algorithm to find out (in this special case) if there is a maximal matching and even to find the maximal number of matched coaches that fulfill the secondary exclusion/preference criteria. This is due to the linear total order of coaches on the TV scale. Thus, you would only get the worst case scenario if for all possible team-configurations in a single league, no maximal matching can be found - even though one team for each coach can be matched with at least one team of another coach (if not, that coach would of course be removed prior to the actual algorithm as it can have no possible partner).

While it turns out not to be n! (not even for the worst case scenario, as you claim) it is likewise nothing like you're describing it... it also grows out of control fairly quickly. Even with massive optimization I can't prevent it from bogging down much past the 20 coaches in a pool mark. Given that in CCL alone there have been instances of 16+ coaches in a single pool, that remains insufficient for a single pool across all leagues.

Now, granted, I'm implementing them in AS3 in these cases, simply to make doing so fast and easy... and AS3 is far from a processing language, but once the pairing comparisons get much past the 1,000,000 mark there's going to be an exponential performance hit regardless of the system.

@ugh said in Matchmaking apparently doesn't take TV or TV+ difference into account.:

Also, I've never aspired to find the 'best' pool, just maximal pools without these weird disparities.

Actually, you've aspired to piss and moan until other people find those best pools as I have yet to see you devise or implement any solutions. Critics always think they provide an important service to the world... but what the world needs is people who solve problems, not simply point them out.

@ugh said in Matchmaking apparently doesn't take TV or TV+ difference into account.:

Actually the resulting pools were smaller in COL than in CCL (last season, but that might be due to that COL CCL mixup).

We know for a fact that it is because of the error related to which league is default because the same thing happened the last time the default changed, so thanks for that nugget of wisdom and all but it's a nugget of squat.

@ugh said in Matchmaking apparently doesn't take TV or TV+ difference into account.:

Also, just think about the special BB case. The more teams are spinning, the fewer large TV gaps exist (statistically). The more people are multi-spinning (obviously with different TV-values to reduce TV-differences, as seems to be their goal), the higher the number of possible matches, ergo the higher the probability of a maximal match being found very quickly if the number of coaches/teams is more dense instead of sparse which can happen more in case of low spinning traffic. And the more matches are being made, the fewer people need to be prioritized.

More teams yes, more coaches no. If everyone were multi-queuing it'd speed things up, but having more teams per coach is what does it, not more coaches per pool. More coaches slows it all down. So does having orphaned coaches if you don't find algorithmic work-arounds.

@crystal_hunter said in Idea about season champion in champions cup:

to keep qualification interesting for all races nothing would change in the qualification system (same top race/wildcard system), so the top team of the race that the champion team of last season has still qualifies, you will just only have 1 less wildcard spot (although you could possibly remove the wildcard option for the champion race during qualification if preferred).

While I can see the appeal from a sporting standpoint, the CCL seasons are less about finding the best coach in the world than they are about encouraging people to put in an effort, play a bunch of games without conceding all the time, and aim for an actual target rather than just focusing on defensive play to build up a team.

To that end, reducing the available slots people can aim for, and completely taking one of the better coaches out of circulation for a season seems a bit counterproductive. The winner of a season already gets 500 euro... that's a pretty decent prize for a game like this.

Changing which rosters qualify for the wildcard slots every season, either randomly or by using the least-played rosters from the previous season, is the best use of those slots in my opinion... I like the idea of variety!

@ugh said in Maybe 'games played' or 'average opponent rank' should have more influence on rank:

No one debated that the ranking formula takes games played into account. But, it doesn't take average opponent rank into account, does it?

No, and *it shouldn't*. "Rank" is ordinal in nature not scalar, and as such has no business being used in such calculations. Rank*points*, the metric on which the rank order is determined, is scalar to a sufficient degree, but its accuracy is a factor of the number of games the team in question has played. Just how legitimately scalar those points are depends on how many games the rated team has played, and how many games their opponents have played.

...

@ugh said in Maybe 'games played' or 'average opponent rank' should have more influence on rank:

I know this ties a bit into the whole ELO-ranking discussion, but...

It is literally a repeat of the ELO ranking discussion, wearing a false nose and mustache.

@ugh said in Maybe 'games played' or 'average opponent rank' should have more influence on rank:

In that case, that is factually untrue (well, 'almost').

Almost facts... which are eyeballed things you've decided are facts... seem to be your bread and butter. That's an almost fact as well.

@ugh said in Maybe 'games played' or 'average opponent rank' should have more influence on rank:

Doesn't seem to be true in my specific case, but statistically probably true.

Awesome. Issue settled, then.

@ugh said in Maybe 'games played' or 'average opponent rank' should have more influence on rank:

My take-away is, that - at least for less played races - restarting and hoping for a good starting record/worse opponents seems to be more rewarding than playing more games with one team in the long run.

But more importantly, what is your "go-away"?

As usual you are making plenty of *declarations* about how things work, but are providing no *evidence* to support your rampant suppositions. We do know that a relationship exists between number of games played and magnitude of final rankPoints... which makes sense since that's how the ranking system is designed. Starting over sends you back to zero games played. That's true because... well... it's part of the definition. That's where we're at. *Go find some evidence to support the idea* that its better to start over than to continue playing the same team in cases where you've lost <x> games.

@ugh said in Maybe 'games played' or 'average opponent rank' should have more influence on rank:

A minimum of games to be played to be able to qualify would possibly ameliorate that problem.

...that you haven't demonstrated exists outside of your anecdotes. I think people who solve problems should focus on the ones that conclusively exist before worrying too much about your imagined dragons.