We are looking for a seasoned Site Reliabity Engineer to join our partner's team, who owns one of the biggest payment platform.

  • What You’ll Do
    • Learn new systems and technologies to fully understand the context of the work you do
    • Work through maintenance requests for the infrastructure supporting Venmo SDLC and production operations
    o Create & support test environments
    o Resolve Sev3 production issues
    o Deploy routing changes
    • Use infrastructure as code to manage infrastructure and application configuration
    o Update puppet modules and roll out changes
    o Update cloud formation stacks and roll out changes
    • Reduce toil by scripting and automating repetitive tasks
    o Pair with service owners to create automations to remediate their production services
    • Work with service owners to improve service observability and error reporting
    o Add metrics and logs as requested by service owners
    o Pair with service owners to create dashboards and alerts to support their production services
    • Manage resolution of support requests end-to-end
    o Take ownership of resolving our service owners’ requests for support
    o Work with SRE leads to identify potential solutions
    o Plan delivery and clearly communicate timelines
    • Work with infrastructure and SRE leadership understand overarching business objectives and help SRE team leads translate those into actionable plans.

    What We’re Looking For

    Must haves:

    • Hands-on experience with infrastructure as code and related tooling (AWS CLI, AWS CloudFormation, Terraform, Ansible and/or Puppet)
    • Hands-on experience with IaaS and PaaS solutions from AWS (or Azure, GCP):
    • Networking and routing (VPC, Direct Connect, VPN)
    • Infrastructure as code (AWS CLI, AWS CloudFormation, Terraform, Ansible, Puppet)
    • Hands-on experience with a programming or scripting language (Lambda, API GW, Python, Java, Bash)
    • Hands-on experience with metrics and log collection systems and visualization tools (DataDog, Grafana, Statsd)
    • Strong communication skills with the ability to understand and explain technical issues to a nontechnical audience
    Nice to haves:

    • AWS Solutions Architect preferred
    • Hands-on experience with distributed application infrastructure:
    o Web servers and proxies: Envoy, NGINX
    o Containers & container orchestration: Docker, Kubernetes, Contour
    o Databases: Mysql, AWS Aurora, Redis.