Senior Site Reliability Engineer (North America)
Dapper Labs
This job is no longer accepting applications
See open jobs at Dapper Labs.See open jobs similar to "Senior Site Reliability Engineer (North America)" CoinFund.Software Engineering
Canada · United States · Remote
Posted on Aug 19, 2023
We’re looking for a Senior Site Reliability Engineer who wants to be at the technical core of an organization that’s completely reshaping how distributed applications on blockchains can reach massive audiences.
You will join a Site Reliability Engineering team that has the ability to architect, build, and iterate on resilient, scalable systems. SRE also guides the organization in areas of Observability, Reliability, and Incident Response. The support we provide the other engineering teams enable them to deliver features that wow and delight our customers at a fast pace.
In the role, you can expect to help us launch reliable products and services with your experience and skills. You’ll join an established team with a focus on providing highly technical support to the rest of the Engineering organization. You will be leveraging infrastructure-as-code, submitting code changes via Pull Requests, and finding creative solutions for the unique and varying needs of each Engineering team. You’ll contribute to the improvement of our in-house systems by researching and applying the latest and greatest technology to our stack. You’ll become empowered to fully apply your experience, lessons learned, and technical abilities in an environment with little tech debt, no on-prem servers, and a strong foundation based on cloud-native technologies such as Kubernetes and industry leading cloud platforms. Every day, you’ll collaborate with a world-class team both in our Vancouver office and distributed worldwide.
What we'll accomplish together
- Develop effective infrastructure (cloud platform services, networking, kubernetes, etc.) for our projects to deploy onto, ensuring projects are scalable, resilient, and reliable in support of growing products.
- Build shared observability services including metrics, logs, tracing, and dashboarding as well as embody a center of excellence partnering with other teams to define SLOs and actionable error budgets for everyone’s services.
- Respond to infrastructure incidents and support the larger Engineering team with their product incident response strategy.
- Perform post-mortems and in-depth root cause analysis to ensure we are always improving.
- Enhance tools and automation to fill the gaps in our current systems as well as build entirely new ones as we face bigger and more complex challenges.
- On-call rotation: 1 week every 5 weeks.
A little about you:
- You execute on defined projects to achieve team-level goals and independently define the right solutions or use existing approaches to solve defined problems.
- You understand OS, networking, kubernetes and other cloud native services and can debug system issues and identify system bottlenecks.
- You have experience working with Infrastructure as Code systems like Terraform, pulumi, or CloudFormation.
- You have experience collecting and processing metrics from tools such as Prometheus/Datadog/NewRelic and are familiar with the concepts of SLOs and SLI targets.
- You are comfortable with responding to production incidents and can fight fires with a calm and level head, leveraging post mortems to apply lessons learned.
- You have experience coding and developing applications. Bonus points for Go experience.
- You are comfortable diving into an unfamiliar system and finding your way around.
- While you believe in processes and the power of planning, you understand that you will often have to roll with the punches and prioritize the most impactful tasks on the fly.
- You have a strong ability to collaborate with cross-functional teams and build solid working relationships with everyone in the organization, from individual contributors to the CEO.
- You have experience building and working on deployment systems.
- You have self-awareness about your strengths and areas for development
- At Dapper Labs, we're looking for people who are passionate about what they do.
- You're encouraged to apply even if your experience doesn't precisely match the job description!
More about Dapper Labs:
Dapper Labs uses blockchain technology to make web3 experiences easy, safe and fun.
Since it was founded in 2018, Dapper Labs has given enthusiasts a real stake in the game by bringing them closer to the brands they love, building engaged and exciting communities for them to contribute to, and producing new pathways for them to become creators themselves.
Dapper Labs is the makers of the Dapper Platform - the trusted gateway to digital worlds - and the officially-licensed digital video collectibles including NBA Top Shot, NFL All Day, UFC Strike and LaLiga Golazos.
Notable investors in Dapper Labs include Andreessen Horowitz, Coatue, Union Square Ventures, Venrock, Google Ventures (GV), Samsung, and the founders of Dreamworks, Reddit, Coinbase, Zynga, and AngelList, among others. Dapper Labs’ studio partners include the NBA and NBPA, the NFL and NFLPA, Ubisoft, Warner Music, Turner, Dr. Seuss, Genies, as well as 100+ others.
Visit our website to learn even more about Dapper Labs, including information about benefits and perks.
#LIremote
This job is no longer accepting applications
See open jobs at Dapper Labs.See open jobs similar to "Senior Site Reliability Engineer (North America)" CoinFund.