Circuit Breaker in Microservice Architecture

Table of Content

Introduction
Circuit Breaker Pattern
Different States of Circuit Breaker
Conclusion

Introduction

In a microservices based architecture, its common for a microservice to make remote calls to another microservice/process, probably on a different machine across the network. As much as a microservice architecture has advantages like low coupling, re-usabilty, business agility it also makes overall architecture brittle as each user action results into multiple service calls.

One of the key differences between in-memory calls and remote calls is that remote calls can fail, or hang without a response until some timeout limit is reached. Your service would either run out of critical resources(thread) or have cascading failures.

Let’s assume a service is down in your microservices (let’s say service B). Now a client is making a call to service A. Then service A will get the results from service B, without which service A cannot complete the request. So the client made a call to service A and service A made an internal call to service B. But the service B is down. So service A will wait for service B until the timeout. Then service A will display the most irritating error message with a number 500. Making the client read this error message after making him/her wait for a long time will be the most irritating thing a developer can do.

Circuit Breaker Pattern

The Circuit breaker pattern helps to prevent such a catastrophic cascading failure across multiple systems. The circuit breaker pattern allows you to build a **fault tolerant and resilient system ** that can survive gracefully when key services are either unavailable or have high latency.

circuit-breaker-states

The basic idea behind the circuit breaker is very simple. You wrap a protected function call in a circuit breaker object, which monitors for failures. Once the failures reach a certain threshold, the circuit breaker trips, and all further calls to the circuit breaker return with an error, without the protected call being made at all. Usually you’ll also want some kind of monitor alert if the circuit breaker trips.

There are a bunch of open source circuit breaker frameworks available today, however Netflix’s Hystrix and Resilience4j are two popular ones.

Different States of Circuit Breaker

Closed

closed-state-sequence

When everything is normal, the circuit breaker remains in the closed state and all calls pass through to the services. When the number of failures exceeds a predetermined threshold the breaker trips, and it goes into the Open state. This state also implies the service is up and running.

Open

open-state-sequence

If the Supplier Microservice is experiencing slowness, the circuit breaker receives timeouts/exceptions for any requests to that service. Once number of timeouts/exceptions reaches a predetermined threshold, it trips the circuit breaker to the OPEN state. In the OPEN state the circuit breaker returns an error for all calls to the service without making the calls to the Supplier Microservice. This behavior allows the Supplier Microservice to recover by reducing its load.

Half Open

half-open-state-sequence

The circuit breaker uses a monitoring and feedback mechanism called the HALF-OPEN state to know if and when the Supplier Microservice has recovered. It uses this mechanism to make a trial call to the supplier microservice periodically to check if it has recovered. If the call to the Supplier Microservice times out, the circuit breaker remains in the OPEN state. If the call returns success, then the circuit switches to the CLOSED state. The circuit breaker then returns all external calls to the service with an error during the HALF-OPEN state.

Conclusion

Circuit breaker pattern will handle faults gracefully. It does not make the client wait for the internal server error and helps us to provide the better user experience.
On the other hand the server might be down with the load. If we continuously send requests to that server, it will make the things worse. So this pattern will reduce the load of the needed service.
With lots of traffic, you can have problems with many calls just waiting for the initial timeout. Since remote calls are often slow, it’s often a good idea to put each call on a different thread using a future or promise to handle the results when they come back. By drawing these threads from a thread pool, you can arrange for the circuit to break when the thread pool is exhausted.
Circuit breakers are also useful for asynchronous communications. A common technique here is to put all requests on a queue, which the supplier consumes at its speed - a useful technique to avoid overloading servers. In this case the circuit breaks when the queue fills up.
Circuit breakers are a valuable place for monitoring. Any change in breaker state should be logged and breakers should reveal details of their state for deeper monitoring. Breaker behavior is often a good source of warnings about deeper troubles in the environment.