Home Our Insights Client Success Stories Optimizing the performance of a business-critical data processing platform

Optimizing the performance of a business-critical data processing platform

A leading US-based pharma company leverages a data platform to handle logistics and distribution. They wanted to optimize its performance for data processing jobs of all sizes.

1 min read

Key achievements

  • Reduction in execution time of small processing jobs by more than 50%
  • Increased scalability and reliability of the data processing platform
  • Significant decrease in resource usage, resulting in financial savings

The Challenge

The Client leverages an advanced data processing platform for everything related to logistics and distribution. As a result, the singular requests vary in size and the resources necessary to handle them. Originally, the platform was optimized only for larger processing jobs, which caused inefficiencies when executing smaller tasks. This was a significant issue, as out of ~20 000 processing jobs executed daily, 50-75% were performed on smaller datasets.

The C&F team identified this area for optimization and proactively offered a solution. The experts proposed changes to the data processing platform’s architecture that would enable it to dynamically scale up and down, matching the necessary resources to the task at hand.

The solution

The first step was implementing an automated process that separates large processing jobs from small ones. To accurately estimate job size, the tool takes into account both the dataset size and the transformation complexity.

Next, the team implemented architecture changes, allowing the data platform to execute smaller tasks more efficiently. Originally, Apache Spark in a cluster mode was used for all processing jobs. This approach remained unchanged for larger tasks requiring more resources. Smaller jobs, on the other hand, are now executed on single-node Apache Spark instances. The amount of resources assigned to that group dynamically scales up and down based on the Client’s needs.

After the changes, the data processing jobs are automatically split into two groups—large and small—and assigned the best way of execution. This improved process efficiency and optimized the utilization of the Client’s cloud infrastructure.

The result

The architecture changes achieved the desired objective. After the project, the processing time of a small job is 50-66% shorter. This, in turn, increases the throughput of the platform and reduces cloud infrastructure costs.

At the same time, the platform became more scalable, capable of handling more processing jobs daily if required. The architecture changes also increased platform reliability.

In addition to reduced costs and improved scalability, the Client’s logistics team now receives the necessary information faster and with greater confidence in the platform’s outputs

Optimizing the performance of a business-critical data processing platform Type: application/pdf Size: 201 KB