Posted 2 years ago
Qualifications:
- 10+ years of experience with integration engineering.
- 10+ years of experience utilizing two or more tools, including Datadog, AppDynamics, Dynatrace, Splunk, Kibana, etc.
- Experience with APIs performance monitoring tool alerts, dashboards, or data trend analysis in a monitoring tool
- Experience with recommending baseline monitoring thresholds and performance monitoring KPIs and SLAs.
- Experience developing high-throughput, fault-tolerant, multi-threaded applications
- Experience building and deploying applications on a third-party cloud provider (e.g. AWS, Azure, etc.)
- Experience with a queueing service such as Amazon SQS and/or Apache Kafka
- Experience with RDBMS’s such as PostgreSQL or MySQL
- Experience with document/NoSQL databases/services such as MongoDB or ElasticSearch
- Experience with gathering and organizing large amounts of data to use for instrumentation into an Enterprise monitoring solution.
- Experience with creating technical documentation.
- Ability to provide monitoring tool infrastructure recommendations.
- Experience as a System Reliability Engineer.
- Hands-on and technically savvy, you have experience helping teams launch applications into a complex production environment.
- Demonstrated ability to work collaboratively across the organization; strong technical and leadership skills, experience building and fostering strong working relationships
- Solid communication skills, attention to detail, strong presentation skills
- Experience with Analytics (Google, Adobe, MixPanel) design and interpretation is preferred.
- Understanding of user facing applications (mobile and web) is preferred.
- Experience with Automobile and Telematics industry is preferred.
Roles & Responsibilities:
- APM Engineer will continue to ensure the application monitoring team is prepared, scheduled, equipped and coordinated to manage highly available systems.
- Partner with Product, Operations and infrastructure teams around Datadog to understand how the applications are deployed so we can effectively scale, evolve and support a broad range of use cases to deploy Datadog.
- Create high-scale, highly-performant interactive visualizations (graphs, maps, charts) that help Operations better understand the story and health of our infrastructure.
- Employ expertise in performance monitoring tool alerts, dashboards, and data trend analysis in a monitoring tool to provides AlwaysOn alerting, metrics visualization, logs, and application tracing.
- Provide technical solutions to a wide range of difficult problems. Provide on-the-job training to application POCs and event management.
- Build rock solid libraries to trace requests as they flow across servers, databases, caches and micro-services.
- Responsible for ensuring that our high-volume, low-latency environments continue to perform around the clock.
- Experience in interacting with cloud infrastructure (AWS, Azure)
- Solve a scaling bottleneck in a critical service
- Deploy a new feature to production, progressively rolling it out with feature flags.
- Investigate and fix a production issue from a service Platform Engineering Services team owns.
- Identify opportunities and implement improvements in monitoring processes.
- Ability to lead talented engineers in solving problems and also groom leaders.