Transitioning from Monolithic to Microservices Architecture: Pros, Cons, and segment’s journey
Segment is a customer data start-up used by frontline customers such as Time, Crate & Barrel, Gap, IBM, New Relic, IBM, and Levi's. It enables businesses to view all customer data in one place before using it for sales, analysis, support and more.
When they struggled with an old and complex infrastructure, they decided to modernize using microservices. However, they soon encountered a series of complexities only solvable by returning to monolithic.
Segment’s platform routes event data between the creation point and the API of the client, which is associated with a platform like Google analytics. This translates into thousands of events sent out by Segment every second.
Most of Segment’s cloud operates on AWS with 16,000 containers, managed by ECS (Elastic Container Service) comprising 250 different microservices.
Microservices are small applications with a single function. When combined, they form larger and more complex applications. The advantage of microservice-based architectures is handling smaller components individually without significantly affecting large applications.
When Segment was founded, their architecture was monolithic. The API processes the data and transmits it to a single queue. An employee reads such data and sends the event in a linear chain to each desired server-side destination; the associated APIs.
"The team soon realized that the channel was blocked causing performance issues when the tool returned an error. It was the reason why they decided to move out of monolith into all the microservices." - said Alexandra Noonan, the software engineer who led the project, in her blog.
Calvin French-Owen, co-founder and CTO at Segment, said: “increased visibility” was an important reason for the move.
The problem with microservices
As Segment became more and more successful and incorporated additional external services, the operating costs of supporting microservices was difficult to bear.
It was found that a routing connection failure resulted in the accumulation of requests when the microservice-based application attempted to resolve the problem. The system attempted to compensate with automatic scaling, which forwarded requests to connection ports which are were used by other clients. These ports were clogged, resulting in poorly system performance.
“On the output side we are sending out up to 200 or more APIs to our customers,” French-Owen explained. “If each is moderately well behaved, we might see them having one bad day per year. But with 200 or more APIs we are seeing an outage every day and a half.”
To make matters worse, requests exceeded the storage capacity of the system, causing some of the data to be lost. This was unacceptable.
Alexandra Noonan also explained that “Our system automatically evolves to manage the increased load but the sudden increase in queue depth is beyond our scalability. This delays the newest events and delivery times are affected because a certain destination is temporarily blocked. We don't have the right tools to test and use microservices when major updates are needed. As a result, the productivity of our developers is declining rapidly."
For Segment, the only answer was to switch the back to monolithic. The decision came with a new architecture that considered the vulnerabilities of scaling up as the business grew.
Return to monolithic
Monolithic architectures consist of a unit with all parts connected. Theoretically, this offers superior performance and efficiency because all the necessary parts are optimized for collaboration. However, this makes it difficult in terms of updates and maintenance.
As part of the transition, Segment's team developed an aggregator, Centrifuge, which replaces individual queues on microservice-based platforms and sends events to a single, monolithic service.
“There is now a single code repository and all destination workers are using the same version of the shared library. Large groups of developers can manage the spike in loads, so we are no longer looking for destinations where small amounts of loads are processed.", says Noonan.
The most important thing is that they can start building new products again. The company recognizes some of the weaknesses of monolithic architectures: fault isolation handling is more difficult, less efficient memory caches and dependency updates which, if not managed, have a great impact on operations.