Posted 2 years ago

Qualifications:

  • 10+ years of experience with integration engineering.
  • 10+ years of experience utilizing two or more tools, including Datadog, AppDynamics, Dynatrace, Splunk, Kibana, etc.
  • Experience with APIs performance monitoring tool alerts, dashboards, or data trend analysis in a monitoring tool
  • Experience with recommending baseline monitoring thresholds and performance monitoring KPIs and SLAs.
  • Experience developing high-throughput, fault-tolerant, multi-threaded applications
  • Experience building and deploying applications on a third-party cloud provider (e.g. AWS, Azure, etc.)
  • Experience with a queueing service such as Amazon SQS and/or Apache Kafka
  • Experience with RDBMS’s such as PostgreSQL or MySQL
  • Experience with document/NoSQL databases/services such as MongoDB or ElasticSearch
  • Experience with gathering and organizing large amounts of data to use for instrumentation into an Enterprise monitoring solution.
  • Experience with creating technical documentation.
  • Ability to provide monitoring tool infrastructure recommendations.
  • Experience as a System Reliability Engineer.
  • Hands-on and technically savvy, you have experience helping teams launch applications into a complex production environment.
  • Demonstrated ability to work collaboratively across the organization; strong technical and leadership skills, experience building and fostering strong working relationships
  • Solid communication skills, attention to detail, strong presentation skills
  • Experience with Analytics (Google, Adobe, MixPanel) design and interpretation is preferred.
  • Understanding of user facing applications (mobile and web) is preferred.
  • Experience with Automobile and Telematics industry is preferred.

Roles & Responsibilities:

  • APM Engineer will continue to ensure the application monitoring team is prepared, scheduled, equipped and coordinated to manage highly available systems.
  • Partner with Product, Operations and infrastructure teams around Datadog to understand how the applications are deployed so we can effectively scale, evolve and support a broad range of use cases to deploy Datadog.
  • Create high-scale, highly-performant interactive visualizations (graphs, maps, charts) that help Operations better understand the story and health of our infrastructure. 
  • Employ expertise in performance monitoring tool alerts, dashboards, and data trend analysis in a monitoring tool to provides AlwaysOn alerting, metrics visualization, logs, and application tracing.
  • Provide technical solutions to a wide range of difficult problems. Provide on-the-job training to application POCs and event management. 
  • Build rock solid libraries to trace requests as they flow across servers, databases, caches and micro-services.
  • Responsible for ensuring that our high-volume, low-latency environments continue to perform around the clock.
  • Experience in interacting with cloud infrastructure (AWS, Azure)
  • Solve a scaling bottleneck in a critical service
  • Deploy a new feature to production, progressively rolling it out with feature flags.
  • Investigate and fix a production issue from a service Platform Engineering Services team owns.
  • Identify opportunities and implement improvements in monitoring processes.
  • Ability to lead talented engineers in solving problems and also groom leaders.

Apply Online