Role Summary:
The Observability Architect is responsible for designing and implementing a holistic observability architecture that enables full-stack visibility, real-time insights, and predictive intelligence across the CCC ecosystem. This role plays a pivotal part in the AIOps transformation, driving unified telemetry, data correlation, and integration with automation workflows.
The architect must bring hands-on experience with AIOps-centric observability solutions and deep expertise in both infrastructure and application performance monitoring.
Key Responsibilities:
- Design the observability reference architecture supporting logs, metrics, traces, events, synthetic and real-user monitoring (RUM) across hybrid infrastructure.
- Architect and deploy observability pipelines that integrate key platforms:
- Dynatrace (full-stack monitoring and AIOps analytics)
- Splunk (infrastructure monitoring, log aggregation, anomaly detection)
- BMC TSOM (TrueSight Operations Management)
- BMC MainView (mainframe telemetry)
- SolarWinds (network and system-level visibility)
- Enable event correlation and noise reduction mechanisms, integrating telemetry streams into ServiceNow Event Management or other manager-of-manager platforms.
- Support deployment of AIOps features such as automatic root cause analysis, predictive alerting, and intelligent incident triage.
- Define and implement dashboards, KPIs, and SLOs/SLIs in tools like Grafana, Dynatrace, and Splunk ITSI.
- Collaborate with the ServiceNow Architect to ensure observability data feeds are aligned with CMDB and service models.
- Ensure data normalization and enrichment strategies enable meaningful business-impact insights.
Requirements
- 8–12 years of experience in observability engineering, enterprise monitoring, or AIOps platform architecture.
- Proven delivery experience in AIOps transformation projects across large-scale, hybrid environments.
- Deep technical knowledge and implementation experience in:
- Dynatrace (full-stack, synthetic, AIOps)
- Splunk Infrastructure Monitoring & Log Management
- BMC TSOM & MainView
- SolarWinds (NPM, SAM)
- Grafana / Prometheus
- Strong understanding of OpenTelemetry, data correlation patterns, and AI/ML-driven observability.
- Familiarity with integrating observability stacks into automation engines and ITSM platforms.
- ITIL and Dynatrace/Splunk/BMC certifications are preferred.