We are looking for a seasoned Site Reliabity Engineer to join our partner's team, who owns one of the biggest payment platform.
- What You’ll Do
• Learn new systems and technologies to fully understand the context of the work you do
• Work through maintenance requests for the infrastructure supporting Venmo SDLC and production operations
o Create & support test environments
o Resolve Sev3 production issues
o Deploy routing changes
• Use infrastructure as code to manage infrastructure and application configuration
o Update puppet modules and roll out changes
o Update cloud formation stacks and roll out changes
• Reduce toil by scripting and automating repetitive tasks
o Pair with service owners to create automations to remediate their production services
• Work with service owners to improve service observability and error reporting
o Add metrics and logs as requested by service owners
o Pair with service owners to create dashboards and alerts to support their production services
• Manage resolution of support requests end-to-end
o Take ownership of resolving our service owners’ requests for support
o Work with SRE leads to identify potential solutions
o Plan delivery and clearly communicate timelines
• Work with infrastructure and SRE leadership understand overarching business objectives and help SRE team leads translate those into actionable plans.
What We’re Looking For
Must haves:
• Hands-on experience with infrastructure as code and related tooling (AWS CLI, AWS CloudFormation, Terraform, Ansible and/or Puppet)
• Hands-on experience with IaaS and PaaS solutions from AWS (or Azure, GCP):
• Networking and routing (VPC, Direct Connect, VPN)
• Infrastructure as code (AWS CLI, AWS CloudFormation, Terraform, Ansible, Puppet)
• Hands-on experience with a programming or scripting language (Lambda, API GW, Python, Java, Bash)
• Hands-on experience with metrics and log collection systems and visualization tools (DataDog, Grafana, Statsd)
• Strong communication skills with the ability to understand and explain technical issues to a nontechnical audience
Nice to haves:
• AWS Solutions Architect preferred
• Hands-on experience with distributed application infrastructure:
o Web servers and proxies: Envoy, NGINX
o Containers & container orchestration: Docker, Kubernetes, Contour
o Databases: Mysql, AWS Aurora, Redis.