Job Information
Burns & McDonnell Department Manager, Observability Operations - Corporate IT (Kansas City) in Kansas City, Missouri
Description
Burns & McDonnell is seeking an Observability Leader to join the IT Infrastructure & Operations team. The ideal candidate will enhance the visibility of our infrastructure's performance and availability by leading efforts in advanced monitoring, logging, and telemetry. You will develop and manage Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to ensure optimal performance metrics. Collaborating closely with our team in India, you will generate comprehensive insights into infrastructure performance, ensuring services are consistently operational and visible. The ideal candidate possesses a deep understanding of observability principles, monitoring tools, and site reliability engineering (SRE) practices, and has a proven track record of developing and executing successful observability strategies.
We are seeking a highly motivated and experienced Observability & Monitoring Lead to build and lead a small, but impactful team responsible for ensuring the performance, health, and availability of our cloud-based services and network infrastructure. This role is critical to maintaining a world-class customer experience and supporting our continued growth.
Responsibilities:
Operations & Integrations: Build and integrate monitoring tools with ticket systems and PagerDuty, applying needed principles to develop and maintain tools for infrastructure performance visibility.
Leadership & Team Development: Lead, mentor, and develop a small team of observability and monitoring engineers. Foster a collaborative and high-performing team environment.
Strategy & Roadmap Development: Define and execute a comprehensive observability and monitoring strategy that aligns with business objectives and supports proactive identification and resolution of performance issues. Create and maintain a strategic roadmap for evolving our observability capabilities.
Tooling & Implementation: Evaluate, select, and implement appropriate monitoring and observability tools to provide comprehensive visibility into our cloud services, network infrastructure, and applications. This includes experience with network monitoring tools, cloud monitoring platforms (e.g., AWS CloudWatch, Azure Monitor, Google Cloud Monitoring), and APM solutions.
Performance Monitoring & Alerting: Establish robust monitoring and alerting systems to proactively detect and diagnose performance bottlenecks, outages, and other critical issues. Define meaningful metrics and thresholds. This includes creating and maintaining real-time dashboards and leverage operations center expertise to ensure seamless monitoring and troubleshooting of infrastructure issues.
Incident Response & Root Cause Analysis: Collaborate with engineering and operations teams to investigate and resolve incidents, perform root cause analysis, and implement preventative measures.
SRE Advocacy & Implementation: Champion SRE principles and practices within the organization. Contribute to the development and implementation of SLOs, SLAs, and error budgets.
Automation & Optimization: Identify opportunities to automate monitoring tasks, improve alerting accuracy, and optimize the performance of our systems.
Reporting & Communication: Develop and deliver regular reports on system performance, availability, and key metrics to stakeholders. Communicate effectively with technical and non-technical audiences.
Vendor Management: Manage relationships with vendors of monitoring and observability tools.
Qualifications
Bachelor's degree in information technology, computer science, or related degree from an accredited program and 8 years related professional experience in information security required or
High School Diploma/GED and 12 years related professional experience in information technology required
Previous leadership and/or management experience is preferable.
Must demonstrate excellent oral and written communication skills, strong interpersonal skills, and the ability to clearly and effectively present complex information to all levels of employees, management, and clients.
Position requires the ability to thoughtfully and positively influence, lead, and manage change.
Must possess strong project management skills and a strategic perspective.
Strongly preferred:
8 years of experience in a related role, with a focus on observability, monitoring, and performance management.
2 years of experience leading and mentoring technical teams.
Deep understanding of observability principles (metrics, logs, traces) and their application in cloud-based environments.
Hands-on experience with a variety of monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, New Relic, Dynatrace, etc.).
Strong understanding of network monitoring concepts and tools (e.g., ping, traceroute, SNMP, network flow analysis).
Experience with cloud platforms (AWS, Azure, GCP) and their native monitoring services.
Familiarity with SRE principles and practices.
Excellent problem-solving, analytical, and communication skills.
Ability to work effectively in a fast-paced, collaborative environment.
Experience with infrastructure-as-code (IaC) tools (e.g., Terraform, CloudFormation).
Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes).
Experience with scripting languages (e.g., Python, Bash, Ansible).
This job posting will remain open a minimum of 72 hours and on an ongoing basis until filled
EEO/Minorities/Females/Disabled/Veterans
Job Information Technology
Primary Location US-MO-Kansas City
Schedule: Full-time
Travel: No
Req ID: 250705
Job Hire Type Experienced #LI-SS #COR N/A