Key Responsibilities
- Toolchain Evaluation & Modernization
- Evaluate legacy monitoring and alerting tools (e.g., BMC MainView, SolarWinds).
- Recommend and integrate a unified observability stack using Splunk, Dynatrace, Grafana, and Elastic Stack.
- Ensure end-to-end visibility across infrastructure, apps, and user experience.
- Deploy AIOps capabilities (event correlation, noise reduction, predictive analytics) using Dynatrace and Splunk.
- Enable intelligent alerting and root cause analysis using ML-based models.
- Integrate ServiceNow ITOM for automated incident creation and enrichment.
- Develop automation playbooks and runbooks (Python, PowerShell, Ansible) for common incident types.
- Enable auto-remediation pipelines linked to AIOps events.
- Support auto-scaling, service restarts, and config drift corrections.
- Deploy logs, metrics, traces using Elastic Stack and Dynatrace.
- Define and implement Service Level Objectives (SLOs), error budgets, MTTR/MTTD benchmarks.
- Build dashboards in Grafana, Dynatrace, and ServiceNow Performance Analytics.
- Redesign and automate event, incident, change, and problem management processes.
- Align monitoring workflows with ServiceNow CMDB and CI health status.
- Shift operations from reactive to proactive, leveraging predictive insights.
Qualifications
- Education:
- Bachelor's in Information Technology, Engineering, or Computer Science
- Master’s degree (optional but preferred)
- Experience:
- 8–12 years in IT operations, observability, or monitoring architecture
- 3–5 years hands-on in AIOps and automation
- Strong background in Dynatrace, Splunk, SolarWinds, ServiceNow, Elastic, BMC tools
- Core Competencies:
- Observability architecture and integration
- AIOps platforms and automation frameworks
- ITOM/ITSM best practices (especially ServiceNow ITOM modules)
- Scripting and tooling orchestration
- Metrics design: MTTR, Uptime, Alert Fatigue Index
- Certifications (Preferred):
- ITIL 4 Managing Professional
- Dynatrace Associate/Professional
- Splunk Core Certified Admin
- DevOps / SRE Foundation