Senior DevOps Engineer ( San Francisco ) at Sysdig
San Francisco, CA, US

Sysdig is looking for creative and ambitious DevOps engineers to help us lead the container revolution. In our four years of existence, Sysdig has become the leading monitoring and visibility company in the red-hot container space. Sysdig maintains several open source tools including the Sysdig linux troubleshooting tool, and the Falco security monitoring tool.

As a Senior Engineer on our DevOps Engineering team, you’ll build solutions to enhance availability, performance and stability of Sysdig services, as well as automating away repetitive work. You'll also respond to pings, pages, and alerts to investigate issues in our products that you can really sink your teeth into. You'll be working on internal and production environments, monitoring, data collection and configuration management, as well as disaster recovery planning, capacity engineering, reliability improvement initiatives and platform automation. The best person for this role is someone that has a collaborative spirit - in our world, it’s not about being a hero and having all the answers, it’s about sometimes saying "I don't know" and working on finding solutions rather than starting with an assumption. The team needs someone who can ask questions, learn from others and turn chaos into order.

This role would be a great fit for someone with creative and innovative problem solving skills with a willingness to take responsibility for the code you write all the way to production. You will develop and implement solutions that operate at scale - seeing your own technology efforts directly improve the reliability of our products. Our teams are empowered and expected to improve our products to truly deliver a reliable experience to customers. You will own development efforts from planning to delivery to realise this goal.

One thing we promise: you’ll never be bored.


  • Expertise with two or more of: Bash, Python, Golang, Java

  • Software development experience using Git

  • Deep understanding of Linux systems

  • Experience with monitoring cloud services using tools like Sysdig, Datadog, or Prometheus

  • Diagnosing and troubleshooting user facing service outages

  • Diagnosing and resolving problems in high-throughput web applications and network services

  • Building, automating, and maintaining infrastructure in Amazon Web Services using CloudFormation or Terraform

We'd be super excited if you have:

  • Experience with container management and microservices architectures such as Docker

  • Experience managing Kubernetes clusters

  • Experience managing Cassandra clusters

  • Experience with Kafka

  • Experience maintaining or contributing to Open Source Projects

  • Experience managing and using log aggregation services like Elasticsearch or Splunk

  • Experience leading teams of engineers in service outage situations

  • Managing and troubleshooting CI/CD pipelines using Jenkins or Bamboo

  • Understanding of ITIL terminology for incident and problem management

  • Experience building PCI/HIPAA compliant infrastructure in the cloud

  • Awareness and insight into industry trends (technology, methods and tooling)