The Challenge
The customer deployed a complex, modern, cloud-based data platform using Azure Stack, Data Lake (ADLS), SQL for data storage, Azure Data Factory, Databricks, and Azure SQL for data processing, and PowerBI for reporting.
Problems to solve
- Rapid deployment by multiple teams: Multiple data products were rapidly deployed on the platform by different cross-functional teams, resulting in a lack of unified visibility
- Lack of visibility: The data platform owner struggled with a lack of visibility into platform usage and operations to ensure proper tool usage, optimal resource allocation, and increased platform adoption
The solution
- We architected and deployed Observability Solutionto monitor all major data platform components:
- Data Processing Engines: Databricks, SQL, ADF (ETL)
- Data Storage: Data Lake, SQL Databases, file stores.
- User activities / queries
- The solution uses open standards and API – Open Telemetry, allowing extensibility to other engines.
- Observability Dashboard that shows platform usage, alerts and trends, including
- Data object volumes and usage
- Processing job details with trends such as errors, processing times, and resource usage
Results
- Improve data quality: Monitor and detect anomalies in data pipelines to reduce errors
- Rapid problem resolution: Real-time monitoring of data pipelines to quickly identify and resolve problems, minimize downtime, and ensure data availability within established SLAs
- Improve operational efficiency: Optimize and streamline data operations to reduce costs and improve resource utilization
- Optimized resource allocation: Better optimize resources, including personnel and infrastructure, resulting in cost savings
- Scalability and growth support: Maintain performance goals by optimizing the handling of growing data volumes
Observability – Phase I