We are looking for a Site Reliabity Engineer to join our partner's team, who owns a payment platform.

What You’ll Do
* Engage in and improve the whole lifecycle of our products—from ideation and design, through development, launch, operation and iteration.

* Ensure sufficient logging, monitoring and alerting strategies around availability, latency and overall system health.

* Scale systems sustainably through automation, and evolve systems by pushing for changes that improve reliability and velocity.

* Partner with product engineering teams through PDLC on design, development, capacity planning, and ramp plans to ensure client continuous to scale and maximize availability.

What We’re Looking For

* BS degree in Computer Science or related technical field involving systems engineering or equivalent practical experience.

* Software Development background with ability to analyze and improve existing codebase.

* Cloud based architecture experience (Ideally AWS).

* Ability to support a 24/7/365 always available production grade service.

* Experience in one or more of the following: Java, Python, Golang, or shell scripting.

*Familiarity with orchestration tools (Ansible, Puppet, Chef, Terraform, etc.).

Preferred Qualifications

*Proficiency in managing cloud based large-scale infrastructure.

*Expertise in designing and troubleshooting large scale distributed systems.

*Strong communicator, both written and spoken.

*Kubernetes and container experience.

*Experience managing large scale, cloud based infrastructure

* designing and troubleshooting large scale distributed systems