Scaling into infinity… Part I

The digital revolution has brought myriad exceptional benefits, but the one I’d like to focus on is the unifying force of shared common services; the idea that anyone, anywhere, with access to the internet can enjoy the same standard of service and, most critically, the same utility of service no matter their means or circumstance.

Whether you’re a multimillionaire or working to make ends meet, this standard design is always prevalent – Amazon doesn’t look any different and Google doesn’t run any faster.

This is especially true for public services that are designed for all different types of people and users, and when you design something for everyone, that service must be robust and perform.

In this blog series, I look to explore and outline how modern delivery teams can build environments that scale to extreme public demand and continue to scale and perform far into the future.


Let’s quickly establish the objective of performance within a digital context:

  • The objective of web performance is to increase the rate at which information and services are consumed across the web
  • This is done by reducing latency, process time and variance across all interactions in a user’s session to return the system to an interactive or information state in a more consistent manner
  • To accomplish this, we work to improve the trail of the distribution since that is how we make systems accessible, reliable, and equitable, i.e., make the 99th percentile just as performant as the 50th.

Less is more with performance. The more we can reduce, the more we can provide. It sounds ironic but it isn’t. It’s surprisingly quite logical.

Performance is about quality just as much as it is about speed. Quality of outcome, quality of experience and, for many companies, quality of conversion i.e., sales.

So how do we scale for large volumes of users, those in the multi-millions, doing ever more complex workloads? Not forgetting some other important questions that we must also ask ourselves:

  1. When is a user, not only a user?
  2. What are the prime examples of why we might need to scale for these kinds of load profiles, is it more than just an ego session on whose cloud is bigger?
  3. What really is the critical path to having a performant environment, and should we really care about how we get there?

First, it’s important to understand the idea of ‘when is a user, not only a user’. What exactly do I mean by this? A user that’s going to the front page of a website and looking or accessing data from a CDN (Content Delivery Network) has a nominal impact on any service.

A CDN, for those not in the know, is essentially a cache that stores the resource that a user has requested on the closest possible server to them which significantly reduces latency and processing times. Pretty handy when you’re dealing with a lot of users making the same request for the same service. Government information services or landing pages are great examples of services that benefit from this type of component.

A user that is streaming video content or entering long-form data content, such as a survey, will have a much larger profile than a user accessing a cached web page, this much is obvious. What’s less obvious is what a user leaves behind, especially in an event-driven solution – something swiftly becoming the norm. Some examples of user baggage:

  • A chain of synchronous and asynchronous calls and processing events – common in event-driven architecture. These can, in many instances, take hours, days or even weeks to process long after a user has completed their web front-end action. Due to process times of data ingress services.
  • Network Address Translation (NATs) processing – NATs are awesome, they allow provisioning of IP addresses from private networks out into the public internet or permit fancy things such as IP masquerading. However, when dealing with high volumes they can quickly become “backed up”. This doesn’t mean the NAT stops working but the data that NAT is consuming does start to take longer to process.
  • Data and ETL Pipelines have ingestion rates. Especially true when they are scanning or transforming that data. The rate at which data is generated by inputs and processed by data pipelines frequently creates bottlenecks in production environments that can take measurable time to clear down.

Why do we care so much about User Baggage?

Good question. We care because if we plan to support a large volume of users continuously, and those users are going to create backlogs in our solution, we need to be certain those backlogs are going to get processed at a rate proportionate to how quickly they are being produced. We also need to make sure our event chains and data in transit are backed up, secure and robust.

Example projects that feature high volumes and have large user footprints include:

  • Population Censuses. Most countries carry out population-wide counts and data-gathering exercises every 10 years. These exercises put enormous write loads on a digital service and are also region-locked due to data legislation which (in turn) prevents global distribution on infrastructure. China has over 494.16 million households for example, to build a platform that could – in a relatively short amount of time – consume such a vast amount of personal data would be a huge task and well exceed the capacity of any cloud provider. Because of this, China decided to carry out a mostly physical census. The UK, however, carried out a digital-first census, with remarkable effectiveness, efficiently scaling for the whole population.
  • Major political change. Think of the UK exiting the EU, this required the resettlement and processing of over 3.5 million EU migrants and the updating of an import and export economy of $610B, all of which needed to be managed by a brand new digital infrastructure.
  • Major sporting events such as the Commonwealth Games, Olympic games and the English Premier League. These all require large-scale IT that only needs to exist for the duration of the event.

All these examples feed into the growing trend of needing enterprise-grade, high-performing, and temporary cloud environments. Getting these events up and running within a short timeframe requires relatively high-velocity delivery.

Large retail services like Amazon, eBay, and Alibaba will have regionally described networks and infrastructure making it much easier to distribute load across multi-regional systems. However, this approach to scaling doesn’t apply to services that must be hosted within a region, such as:

  • GDPR, UK Data Regulations, Chinese Government restriction
  • Data requiring some form of transformation process that requires centralised orchestration
  • Private Cloud; Health Data, Defence & National Security.

How to scale with the best of them

The cloud is still ultimately a physical infrastructure, especially for emergent services that cloud providers are starting to offer that have limited regional availability. What further limits the cloud is both legislation and data restrictions that impede the global distribution of service across multiple regions. Artificial limits such as cloud quotas also need to be considered i.e., the limit cloud providers place on the resource you’re allowed to have in your instance.

As the size of workloads increase so do the complexities of meeting the demands of those workloads. Adding resources to solving any given scaling problem will rapidly lead to a severe drop in efficiency and diminishing returns.

When dealing with scaling we must consider the key areas and core types:

  • Horizontal scaling – the ability to spread our workloads elastically across multiple nodes, compute resource, pods etc.
  • Vertical – the process of increasing the capacity of a single service or static set of services.

It’s also important to identify which resources are truly elastic and which are not. When scaling horizontally, you’re going to come into networking or speed up latency issues i.e., more resources are now needed to manage more resources. If you have listeners, agents, or exports, logging this will all compound as you scale horizontally, far more so than when you scale vertically. Additionally, the communication between horizontal services and keeping them synchronised also increases overheads.

That said, it’s always wise to scale horizontally as it’s more efficient, more fault-tolerant and far more flexible to adapting workloads. It’s cheaper and quicker to create many small things than it is to create one big thing.

Vertical scaling introduces single points of failure, lower availability, and a far lower overall capacity for workload management. You can only make a single server or service so large, or eventually you’ll just end up moving your bottleneck around rather than getting rid of it.

Rightsizing! Rightsizing! Rightsizing!

There is so much value in rightsizing. Not only does it reduce costs and improve performance, but it also helps your engineers really understand what they are building and why each component is needed.

Picking the right data type, the right operating system, and the right programming language, and then building them together in the right way has an enormous impact on the performance of your service and how lean you can run it. Running optimised containers is between 75%-90% cheaper than operating VMs. So, when your cloud bill is running into the millions that level of saving becomes very attractive.


Our Accreditations and Certifications

Crest Accreditation Resillion
Check Penetration Testing
RvA L690 Accreditation
ISO 27001
ISO 9001 Resillion
CCV Cyber Pentest
Cyber Essentials

Contact Us