Design For Scale

Design For Scale

Internal and External Load balancing At Scale

Thoughts on Meta’s Hyperscale Infrastructure Insights Paper

Jin Peng's avatar
Jin Peng
Feb 05, 2025
∙ Paid

Remote Procedure Calls (RPCs) between internal services behave differently than external customer requests, one is focused on performance and efficiency for internal communications, and the other one is focused on stability and user experience for external interactions. I’ve been pondering this for a while now: Should Load balancing strategies be tailored separately to meet both internal and external requirements? The Meta Hyperscale Infrastructure Insights paper is a goldmine of data on this subject.

Let’s see how Meta’s hyperscale infrastructure manages load balancing for internal and external traffic.

Internal RPC Handling - Client Side Load Balancing

Internal infrastructure prioritizes speed and efficiency. Instead of relying heavily on traditional service meshes with sidecar proxies, Meta uses a routing library linked directly into service executables for 99% of RPC requests. This direct client-to-server routing bypasses the overhead of intermediate proxies. While the routing of RPCs is decentralized, Meta employs a centralized control plane called ServiceRouter. This system manages and updates a Routing Information Base (RIB) that contains service discovery information, routing configurations, and cross-region routing data. The RIB is replicated across multiple servers for scalability and availability. The actual routing of RPC requests uses a distributed data plane.

This approach avoids the need for a large number of servers dedicated to routing proxies saving costs and resources. Also, the centralized control plane ensures consistent routing and configuration across Meta's services.

External API Handling

External APIs, on the other hand, prioritize stability, predictability, and standardization. External APIs are the public-facing interfaces used by customers, partners, and third-party developers.

Meta’s external user requests are first routed to a geographically close edge datacenter, also known as a Point of Presence (PoP). These requests are then forwarded to a main datacenter region through Meta's private Wide Area Network (WAN). Dynamic DNS mapping and traffic engineering tools are employed to distribute load across PoPs and datacenters. The traffic is load-balanced by L7 proxies. These proxies route traffic to backend servers and also perform functions such as request inspection and load balancing.

User's avatar

Continue reading this post for free, courtesy of Jin Peng.

Or purchase a paid subscription.
© 2026 Jin Peng · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture