We are looking for a Site Reliabity Engineer to join our partner's team, who owns a payment platform.
What You’ll Do
* Engage in and improve the whole lifecycle of our products—from ideation and design, through development, launch, operation and iteration.
* Ensure sufficient logging, monitoring and alerting strategies around availability, latency and overall system health.
* Scale systems sustainably through automation, and evolve systems by pushing for changes that improve reliability and velocity.
* Partner with product engineering teams through PDLC on design, development, capacity planning, and ramp plans to ensure client continuous to scale and maximize availability.
What We’re Looking For
* BS degree in Computer Science or related technical field involving systems engineering or equivalent practical experience.
* Software Development background with ability to analyze and improve existing codebase.
* Cloud based architecture experience (Ideally AWS).
* Ability to support a 24/7/365 always available production grade service.
* Experience in one or more of the following: Java, Python, Golang, or shell scripting.
*Familiarity with orchestration tools (Ansible, Puppet, Chef, Terraform, etc.).
*Proficiency in managing cloud based large-scale infrastructure.
*Expertise in designing and troubleshooting large scale distributed systems.
*Strong communicator, both written and spoken.
*Kubernetes and container experience.
*Experience managing large scale, cloud based infrastructure
* designing and troubleshooting large scale distributed systems