Vonmo's Blog - Building distributed apps: zero approximation

These days, the world is moving fast. Progress constantly creates new technological challenges. Information systems architecture should be flexible enough to respond to new requirements and evolve. In this article, we are going to talk about event-driven architecture, concurrency, parallelism, asynchrony, and how to live in peace with all that in Erlang.

Introduction

We, as developers, opt for this or that type of information exchange depending on the system size and requirements. In most cases, what you can use to coordinate services is a pattern with a broker, for example, based on RabbitMQ or kafka. Sometimes the event stream, SLA, and the level of control over the system are such that ready-made messaging doesn’t suit us. Of course, you can complicate the system a little bit and use ZeroMQ or nanomsg. By doing so, you will take responsibility for the transport layer and cluster formation. But if the system lacks throughput or standard Erlang cluster capacity, then the question of adding an extra entity requires examination and economic rationale.

The topic of reactive distributed applications is quite vast. In order to meet the format of the article, today we will focus on homogeneous environments built on Erlang/Elixir. Erlang/OTP ecosystem enables us to implement reactive architecture with the least time and effort. Anyway, we will need a messaging layer.

Theoretical framework

Engineering starts with defining the goals and limitations. The main aim doesn’t lie in the area of development for the sake of development. What we need is to get a safe and scalable tool which will let us create and develop modern applications of various levels. The apps may vary from single server applications that serve a small audience to 50–60 node clusters or cluster federations. Thus, our main aim is to maximize profit by reducing the system development and ownership costs.

Let’s emphasize 4 requirements for the target system:

Scalability. Separate units can be scaled both vertically and horizontally. There should also be an opportunity to scale the whole system horizontally with no limits;
Event-driven nature. The system is always ready to deal with the events flow and process it;
Latency which is guaranteed. Time is money, so the users shouldn’t wait for too long;
Fault tolerance. All levels and services have to be restored automatically in case of a failure.

The system can advance from the MVP stage and be progressive if its basis satisfies the minimum requirements of SELF. Messaging as an infrastructure tool and basis for all the services is characterized by its usability for programmers too.

Event-driven nature

So that an app could grow from one server to a cluster, its architecture should provide loose coupling. This requirement is met in an asynchronous model. Within such a model, both a sender and a receiver care about the information load of their message and do not worry about message passing or routing.

Scalability

Scalability and system performance often go hand in hand. App components must be able to utilize all resources available. The more effectively we can use capacities and the more optimal our processing methods are, the less money we spend on equipment.

Within a single machine, Erlang creates highly concurrent environment. The balance between parallelism and concurrency can be set via selection of the number of OS threads accessible for Erlang VM, as well as the number of schedulers which utilize these threads.

Erlang processes don’t have shared state and work in non-blocking mode. This provides relatively low latency and higher throughput than traditional blocking sync apps have. Erlang scheduler takes care of fair CPU and IO allocation. The absence of blockages enables the app to respond even in case of peak load or failures.

At a cluster level, the problem of utilization also exists. It’s important to distribute the load evenly between the machines in a cluster and not to overload the network. Imagine: user traffic is landing on load balancers (e.x. haproxy, nginx, etc). These balancers are distributing requests between available handlers in servers pool. Within app infrastructure, the service which implements the interface required is just the last mile. It has to request a number of other services to respond to the initial query. Internal requests require routing and balancing too.

To manage data flows properly, messaging should provide developers with an interface for controlling routing and load distribution. As a result of this, developers will be able to use microservice patterns and solve both routine tasks and non-standard ones.

From a business perspective, scalability is one of the tools for risk management. The key thing here is to meet the client’s needs and use the hardware efficiently:

If you improve the performance of hardware through progress, the hardware won’t stand idle because of software imperfections. Erlang can be perfectly scaled vertically and is able to utilize all CPU kernels and the memory available;
Within cloud environments, we can manage the amount of hardware depending on current or predicted load and ensure SLA.

Fault tolerance

Let’s consider two axioms: ‘Failures are unacceptable.’ and ‘There will always be failures.’ For any business, a software failure means lost money and, what’s much worse, loss of reputation. Even teetering between potential losses and the development cost for fault-tolerant software, it’s usually possible to reach a compromise.

In the short term, fault-tolerant architecture saves money on purchasing ready-made clustering solutions. They might cost you an arm and a leg, and contain errors.

In the long term, fault-tolerant architecture pays off its costs many times over at all stages of development.

Messaging inside the code base enables you to elaborate components relationships in a system at the development stage. This facilitates the task of managing the faults as all the responsible components process the faults, and the target system with automatic fail-over knows how to restore itself to a normal state after a fault by design.

Responsiveness

Regardless of faults, your app should respond to queries and comply with SLA. The reality is that people nowadays are not ready to wait. Consequently, business must adjust. More and more apps are expected to be highly responsive.

Responsive applications work in near real-time mode. Erlang VM operates in soft real-time mode. For some fields, however, like stock exchange, medicine, or industrial equipment the hard real-time mode is crucial. Responsive systems improve UX and help business.

Preliminary conclusion

When I was planning this article, I wanted to share my experience of creating a message broker with further building of complex systems based on it. However, the theoretical and motivational part turned out to be too extensive.

In the second part of the article we’ll talk about the nuances of exchange points implementation, and message exchange patterns.

The third part will be devoted to some general issues of arrangement of services, routing, and balancing. Also, we’ll talk about the practical side of scalability and fault tolerance.

The end of the first part.

Contents

Building distributed apps: zero approximation