Know ATS Score
CV/Résumé Score
  • Expertini Resume Scoring: Our Semantic Matching Algorithm evaluates your CV/Résumé before you apply for this job role: Lead Site Reliability Engineer.
Ireland Jobs Expertini

Urgent! Lead Site Reliability Engineer Job Opening In Dublin – Now Hiring Salesforce, Inc..

Lead Site Reliability Engineer

Job Expired.


Job description

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems.

SRE ensures that Salesforce services have reliability, capacity, performance and the availability to deliver our customer’s needs and a rate of improvement that our customers expect.

Our software development focuses on enabling service owners to operate their services safely at scale, whether through paved path integrations onto observability frameworks, optimizing existing systems, designing infrastructure or eliminating work through AI/ML.

On the SRE team, you’ll have the opportunity to manage the complex challenges of scale which are unique to Salesforce, while using your expertise in coding, algorithms, complexity analysis and large-scale system design.

SRE’s culture of diversity, intellectual curiosity, problem solving and openness is key to its success.

Our organization brings together people with a wide variety of backgrounds, experiences and perspectives.

We encourage them to collaborate, think big and take risks in a blame-free environment.

We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.

Required Skills

  • 7+ years of experience in Python, Go, or Java for automation, tooling, and integration.

  • Hands-on experience designing, building and operating large scale distributed systems, identifying shortcomings and optimization opportunities

  • Demonstrated experience in developing and deploying production-grade software applications or services.

  • Proven ability to contribute directly to application codebase improvements for reliability and scalability.

  • Strong understanding of software engineering best practices, including design patterns, testing methodologies, and code reviews, applied in a production environment.

  • Excellent knowledge of Internet technologies and protocols (TCP/IP, DNS, HTTP, SSL, etc.)

  • Ability to locate and address sources of instability in high-traffic, large-scale distributed systems

  • Strong experience with API fundamentals (SOAP, REST)

  • Experience in Public Cloud environments, Kubernetes and modern container orchestration.

  • Knowledge of microservices, service mesh, and zero-trust infrastructure.

  • Solid knowledge of large-scale complex systems from a reliability and availability perspective

  • Hands-on with experience with large scale SDLC pipelines.

  • Strong Linux systems knowledge and troubleshooting skills.

  • Experience in fault modeling and tolerance, chaos engineering, performance and load testing.

Responsibilities

  • Support and scale multi-cloud, multi-region services.

  • Build automation and self-healing capabilities to reduce manual operations.

  • Operate and scale monitoring, alerting, and tracing systems for proactive detection.

  • Improve CI/CD practices to accelerate safe, frequent deployments.

  • Define and implement SLIs/SLOs with engineering teams, driving reliability into system architecture.

  • Collaborate on integrating AI-driven automation and observability to enhance reliability.

  • Work within Agile teams, participating in SCRUM ceremonies and iterative delivery.

  • Lead post incident analysis, conduct postmortems, and ensure effective root cause resolution.

  • Use data to uncover trends, inform prioritization, and drive platform improvements.

Desired Skills

  • Experience operating in global, multi-tenant, or compliance-sensitive environments.

  • Understanding of SRE principles: SLIs/SLOs, availability, resiliency, and incident metrics (TTD, TTR).

  • Data-driven mindset for identifying systemic issues and improving service reliability.

  • Design and Implementation of Observability Solutions

  • Strong written and verbal communication, with emphasis on documentation and knowledge sharing.

  • Experience building and integrating AI-driven automation and observability to enhance reliability

#J-18808-Ljbffr


Required Skill Profession

Database, Analytics & Bi


Job Expired.


Your Complete Job Search Toolkit

✨ Smart • Intelligent • Private • Secure

Start Using Our Tools

Join thousands of professionals who've advanced their careers with our platform

Rate or Report This Job
If you feel this job is inaccurate or spam kindly report to us using below form.
Please Note: This is NOT a job application form.


    Unlock Your Lead Site Potential: Insight & Career Growth Guide