• The Bell Tree Fair 2024's closing ceremony is finally here! Event results, TBTer drawings, collectible reveal, quiz answers, art, stories, raffles, and more. You can find the six-part thread in the Bulletin Board! Thank you, everyone, for making our TBT birthday celebration so special!

Rating the changes: Finding Standard Deviation

Alolan_Apples

“Assorted” Collector
Joined
Sep 9, 2014
Posts
26,554
Bells
4,323
Tickets
0
Throwback Tickets
0
Confetti
2
Switch
1624-3778-0694
Island
Palm City
Flower Glow Wand
Cool Balloon
Ghostly Kitty Plush
Yule Log
Disco Ball Easter Egg
Tetris Grid
Chocolate Cake
Pumpkin Cupcake
Apple (Fruit)
Ice Cream Swirl (TBT Beach Party)
As I continue talking about rating the changes, I’m beginning to tackle another concept related to it - Standard Deviation.

Yesterday, I mentioned how Rating the Changes is completely centered on Comparison/Contrast, a form of text structure (an English related subject). A Math related subject to this is actually Statistics. The reason is that once I rate every change (and similarity), I find the average score by adding all of the scores and dividing it by the number of changes. That is the final score. For your information, here’s what each range means for average:

Changes:

  • Higher than +2.500 - mostly inconceivable, but if this is the average, I strictly recommend playing the latter game over the former game, as the developers did a very good job on the new game or update.
  • +1.500 to +2.500 - the developers overall did a better job on the latter entry. I believe the latter game (or game after update) is better than the former game (or game before update), as I hope that the developers stick to the same features for the next game or update. Also, I would recommend gamers to play the newer entry.
  • +0.500 to +1.500 - the changes are more positive than negative, but when there is a negative change, it’s a serious one. Otherwise, it could also be that I didn’t care too much about the newer entry, but they are slightly better than the older entry.
  • -0.500 to +0.500 - neither the former nor latter entry is better, but each have their merits on what I like and what I didn’t like.
  • -1.500 to -0.500 - the changes are more negative than positive, but there are some aspect that the developers are definitely right about. Or maybe, I only see bad changes, but no really bad changes. The former is better than the latter.
  • -2.500 to -1.500 - the newer version sucks. The developers did a terrible job on it, and I would prefer the older entry more. I would also not recommend playing the latter version.
  • Lower than -2.500 - mostly inconceivable, but this is a perfect example that proves that old beats new. Nostalgia wins again.
Similarities:

  • More than +1.500 - the developers basically changed everything that I wanted to see an improvement in, as nearly every similarity (or every similarity) are the stuff I didn’t want changed.
  • +0.500 to +1.500 - same as above, but there is still some work needed to be done. But most of the stuff that stayed the same were the stuff I didn’t want to change.
  • -0.500 to +0.500 - I’m mostly indifferent to the similarities, as there are either too many similarities I wouldn’t care if they changed, or there’s a balance in stuff I wanted changed and stuff I’m glad that were kept the same.
  • -1.500 to -0.500 - whether the latter entry is good or bad, it still has many flaws that the former entry has, and the developers should’ve made improvements. There are also some similarities I’m glad they kept.
  • Less than -1.500 - there is a whole lot of work that needed to be done, as I am disappointed in the developers for not solving the problems.
Now let’s go over how to calculate the standard deviation. Once you found the average, you subtract the average from every data point. If it’s from rating every change, subtract the average score from each score of every change. Then you square every difference (aka multiply each difference by themselves). Find the sum of the squares, and divide it by the number of points (for entire populations) or the number of points minus one (for samples). Since the change rating system uses the entire population of changes, you should divide the sum of the squares by the number of changes. Finally, you take the square root.

Example: Last Friday, I rated every change from Pokémon Sun and Moon to Ultra Sun and Ultra Moon. 7 of the changes USUM did to SM had a score of +3. 3 of the changes had a score of +2. 2 of the changes had a score of +1. 3 of the changes had a -2. And there’s one change that had a score of -3.

Sum: 3+3+3+3+3+3+3+2+2+2+1+1-2-2-2-3 = 20
Average: 20/16 = 1.25

The average score is +1.250. While I did like how Game Freak improved on Pokémon Sun and Moon, I preferred if Professor Kukui was the final boss instead of Hau, and I did like collecting Zygarde cells and cores over totem stickers.

Now we subtract 1.25 from every score, then square each difference.

(3-1.25)[SUP]2[/SUP]=3.0625
(2-1.25)[SUP]2[/SUP]=0.5625
(1-1.25)[SUP]2[/SUP]=0.0625
(-2-1.25)[SUP]2[/SUP]=10.5625
(-3-1.25)[SUP]2[/SUP]=18.0625

Then we add all of the squares.

Sum: 3.0625 + 3.0625 + 3.0625 + 3.0625 + 3.0625 + 3.0625 + 3.0625 + 0.5625 + 0.5625 + 0.5625 + 0.0625 + 0.0625 + 10.5625 + 10.5625 + 10.5625 + 18.0625 = 73
Variance: 73/16 = 4.5625
Standard Deviation: 2.136

The Standard Deviation of all of the Pokémon Ultra Sun/Ultra Moon changes is 2.136. What does that mean? That’s what I’m about to talk about:

While the average score is my overall opinion on the latter game, the standard deviation measures the accuracy of my overall score. Based on my change rating system, it can’t be any less than 0, but it can’t exceed 3 for changes, and 2 for similarities. The lower the standard deviation is, the more accurate the average score is. So I already said that Pokémon USUM has an average score of +1.250. If the standard deviation is less than 1, it means Game Freak really didn’t improve that much, and hardly found anything that was the right step forward, but there are fewer to no negative changes. But since the standard deviation is greater than 2, it means that there were a lot of great changes, but the bad changes were bad enough to lower the average, which makes the score of +1.250 quite inaccurate.

For the similarities I rated in Pokémon Sun and Moon and Ultra Sun/Ultra Moon, my average score is 0, which means that while there is some stuff I’m glad they didn’t change, there are also just as many stuff they should’ve fixed. The standard deviation is 1.500, which means that my score of 0 isn’t accurate.

If the average score is accurate, then I’m pretty serious about how they changed features. If it’s inaccurate, I can’t properly explain why I rated the game with that score. There are too many negative changes in a game I find better than the previous, or there are too many similarities I liked in a game they needed to change a lot.

Here’s the full scoring on Standard Deviation:

Changes:

  • 2.500 to 3.000 - the average score is very inaccurate. The average score is mostly neutral, but there are so many very positive changes and very many negative changes that put the score to neutral.
  • 2.000 to 2.500 - the average score is inaccurate. There is a large variety of scores, as they are mostly in balance.
  • 1.500 to 2.000 - the average score is more inaccurate than accurate, but the variety of scores are out of balance.
  • 1.000 to 1.500 - the average score is more accurate than accurate, but there’s still a variety of scores, but very imbalanced.
  • 0.500 to 1.000 - the average score is accurate, as my opinion is more solid based on a particular score. This basically indicates whether or not I would recommend playing the latter.
  • 0.000 to 0.500 - mostly inconceivable, but my score is very accurate, enough to make my opinion solid. If the average score is less than -2.000, and this is the standard deviation, you know what that means.
Similarities:

  • 1.500 to 2.000 - the average score is inaccurate. I can’t explain what needs to be changed and what shouldn’t be changed very well.
  • 1.000 to 1.500 - the average score is more inaccurate than accurate, as there’s a lot of similarities I’m positive about and a lot of similarities I’m negative about.
  • 0.500 to 1.000 - the average score is more accurate than inaccurate, but the score would better describe if the game needs more changing or if the game doesn’t.
  • 0.000 to 0.500 - the average score is accurate. It basically tells you that every similarity should be retained in the next entry, or every similarity are major downfalls of the series and should be changed.
Previous Scores:

I already talked about Pokémon Ultra Sun/Ultra Moon and Pokémon Sun and Moon and how accurate their average scores are. The average score of changes is +1.250, but the standard deviation is 2.136. The average score of similarities is 0.000, but the standard deviation is 1.500. Here are the other games I rated:

  • Let’s Go Pikachu/Let’s Go Eevee vs Pokémon Red/Blue/Yellow: Change Avg: +1.286; Change Stdev: 1.868; Similar Avg: +0.286; Similar Stdev: 1.578
  • Animal Crossing GameCube vs Wild World: Change Avg: +0.580; Change Stdev: 2.333; Similar Avg: -0.160; Similar Stdev: 1.736
  • Mario Kart 8 vs Mario Kart 8 Deluxe: Change Avg: +2.125; Change Stdev: 0.850; Similar Avg: -0.875; Similar Stdev: 1.69
And that’s all. Next time I rate the changes, I’m gonna start adding standard deviation to the game’s score, as well as the mode of significance.
 
Back
Top