Designing an ELO System That Feels Fair
Why standard ELO formulas fail for casual mobile games, and how I adapted the algorithm for Number Strike Baseball's ranked play.

ELO ranking systems are deceptively simple: win against a higher-rated opponent, gain more points. Lose against a lower-rated opponent, lose more points. The formula fits in one line of code. But making it feel fair in a casual mobile game required significant modifications to the standard algorithm.
Mobile games present challenges that traditional ELO implementations never anticipated. Players disconnect mid-game due to poor cellular connections, play with wildly varying levels of attention, and may abandon matches without formally conceding. Each of these scenarios needs explicit handling in the rating system. We defined disconnection as a loss after 30 seconds of inactivity, but with a reduced K-factor to limit the penalty — because blaming a player's subway tunnel for their rating drop feels unfair.
“Why standard ELO formulas fail for casual mobile games, and how I adapted the algorithm for Number Strike Baseball's ranked play.”
The first problem was new player volatility. Standard ELO starts everyone at 1200 and adjusts slowly. In Number Strike Baseball, a new player could lose their first five games and drop to 1000, which feels devastating. We implemented a provisional period for the first 20 games with a higher K-factor, allowing ratings to adjust quickly to a player's true skill level.
The placement match system runs for a player's first 20 games and uses an accelerated calibration algorithm. During placement, the K-factor is set to 64 instead of the standard 32, meaning each game has double the rating impact. We also use the placement period to seed the player's hidden confidence score — a measure of how certain the system is about their true rating. A player who goes 15-5 in placement exits with both a higher rating and a higher confidence score than someone who goes 10-10, even if both end at similar ELO numbers.
The second problem was matchmaking at the extremes. Top players had 10-minute wait times because there weren't enough similarly-rated opponents. Our solution was a dynamic search range that widens over time — start looking for opponents within 200 ELO, expand by 100 every 5 seconds, cap at 500. This keeps wait times under 30 seconds for 95% of matches.
Network latency required careful consideration in a game where response time matters. Number Strike Baseball has timed rounds, and a player with 200ms latency has a measurable disadvantage against someone with 20ms latency. Rather than adjusting ELO calculations for latency — which would be gameable — we addressed this at the gameplay layer. The timer adjusts based on measured round-trip time, giving high-latency players proportionally more time. This keeps the ELO system pure while ensuring the underlying game is fair regardless of network conditions.
The third problem was rating deflation. In a game where the player base is growing, new players enter at 1200, lose to established players, and their lost points get distributed upward. Over time, the average rating of active players drifts above 1200, and new players face increasingly tough initial matches. We countered this with a monthly rating reset that pulls everyone 20% toward 1200.
We implemented a seasonal structure with soft resets every three months. At season boundaries, all ratings are compressed toward 1200 by 20% — a player at 1600 becomes 1520, a player at 800 becomes 880. This compression serves multiple purposes: it re-engages lapsed players who return to find their rating hasn't decayed to irrelevance, it prevents rating inflation over time, and it creates a natural competitive cycle where climbing back feels fresh rather than grinding against a static ladder.
Win streaks needed special handling. Five wins in a row suggests the player is underrated, not lucky. We boost the K-factor during win streaks to accelerate rating convergence. The inverse applies to losing streaks — the system recognizes that something is off and corrects faster.
Anti-manipulation measures protect the integrity of the ranking system. Win trading — where two players collude to inflate one's rating — is detected by analyzing match pair frequency and outcome patterns. If two accounts play each other more than three times in a week with a suspiciously lopsided record, both are flagged for review. We also detect smurf accounts — experienced players creating new accounts to stomp beginners — by monitoring new accounts whose performance in placement matches significantly exceeds expected patterns for genuinely new players.
The most important design decision was making the rating visible. Some games hide the number behind a league or tier system. We show the raw ELO number because our players are logic puzzle enthusiasts — they appreciate the mathematical precision, and the visible number creates a clear, motivating progression system.
The analytics dashboard for rating system health became an essential internal tool. It tracks the distribution of ratings across the player base, match quality metrics like average ELO difference between opponents, queue time percentiles, and the rate of disconnections per rating bracket. When we noticed that players in the 800-900 range had a disconnection rate three times higher than average, it revealed a frustration threshold — these players were losing frequently and rage-quitting. We responded by adjusting the matchmaking algorithm to prioritize closer matches at lower rating brackets.
ELO ranking systems are deceptively simple: win against a higher-rated opponent, gain more points. Lose against a lower-rated opponent, lose more points. The formula fits in one line of code. But making it feel fair in a casual mobile game required significant modifications to the standard algorithm.
Mobile games present challenges that traditional ELO implementations never anticipated. Players disconnect mid-game due to poor cellular connections, play with wildly varying levels of attention, and may abandon matches without formally conceding. Each of these scenarios needs explicit handling in the rating system. We defined disconnection as a loss after 30 seconds of inactivity, but with a reduced K-factor to limit the penalty — because blaming a player's subway tunnel for their rating drop feels unfair.
The first problem was new player volatility. Standard ELO starts everyone at 1200 and adjusts slowly. In Number Strike Baseball, a new player could lose their first five games and drop to 1000, which feels devastating. We implemented a provisional period for the first
...
Tags: Game Dev, Algorithms, Firebase, Matchmaking
See Also:
→ The Five-Word Quiz That Fills an Empty Deck on Day One→ AI Agents Are Replacing the Traditional Software Development Lifecycle→ Building a Multi-Tenant Marketplace from Scratch→ PostgreSQL vs Firestore: A Practical Decision Framework→ How GenAI Reduced Our Operational Overhead by 90%Browse all articles →Key Facts
- • Category: Dev
- • Reading time: 14 min read
- • Technology: Game Dev
- • Technology: Algorithms
- • Technology: Firebase