BASF Veterans Jobs

Job Information

TEKsystems Observability Monitoring Engineer in Atlanta, Georgia

Description:

We are seeking a highly skilled AI/ML Telemetry Engineer with expert knowledge in Prometheus, Grafana and Git. This role involves developing and managing telemetry for large-scale datasets and implementing strategies to enhance AI system reliability and performance and help in Capacity Management.

Required skills are below

"Primary Skills: Telemetry/Observability, Monitoring and Alerting, Data Collection and Analysis, Automation

• Prometheus and Grafana

• Json/YAML

• Kubernetes and Docker/Container Technologies

• DCGM/DCGM Exporter (Nvidia Stack)

• Solid understanding of telemetry concepts, metrics, logs, and tracing."

Skills:

Linux, Cloud, Python, Automation, Administration, Terraform, Kubernetes, Troubleshooting, JSON, YAML, prometheus, grafana, monitoring tools, splunk, network monitoring, engineering

Top Skills Details:

Linux,Cloud,Python,Automation,Administration,Terraform,Kubernetes,Troubleshooting,JSON,YAML,prometheus,grafana,monitoring tools,splunk,network monitoring,engineering

Additional Skills & Qualifications:

Desired skills are below

"• Desired outcomes of AI Platforms required to support the Data Scientist community

• Familiarity with AI/ML frameworks (e.g., TensorFlow, PyTorch) and pipeline orchestration tools (e.g., Kubeflow

• Linux Resource Management Tools – SLURM

• Nvidia Software Stack – CUDA, TensorRT, Triton Inference Server

• Jupyter Notebook

• Splunk​"

Experience Level:

Expert Level

Benefits

  • Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to specific elections, plan, or program terms. If eligible, the benefits available for this temporary role may include the following:

  • Medical, dental & vision

  • Critical Illness, Accident, and Hospital

  • 401(k) Retirement Plan – Pre-tax and Roth post-tax contributions available

  • Life Insurance (Voluntary Life & AD&D for the employee and dependents)

  • Short and long-term disability

  • Health Spending Account (HSA)

  • Transportation benefits

  • Employee Assistance Program

  • Time Off/Leave (PTO, Vacation or Sick Leave)

    About TEKsystems:

We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. That's the power of true partnership. TEKsystems is an Allegis Group company.

The company is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.

DirectEmployers