Job Description
**Introduction**
A career in IBM Software means you’ll be part of a team that transforms our customer’s challenges into solutions.
Seeking new possibilities and always staying curious, we are a team dedicated to creating the world’s leading AI-powered, cloud-native software solutions for our customers.
Our renowned legacy creates endless global opportunities for our IBMers, so the door is always open for those who want to grow their career.
IBM’s product and technology landscape includes Research, Software, and Infrastructure.
Entering this domain positions you at the heart of IBM, where growth and innovation thrive.
**Your role and responsibilities**
We are seeking a highly experienced Senior Python Developer to join the ContextForge MCP Gateway team as a technical leader.
ContextForge is an open-source production-grade gateway, proxy, and registry for Model Context Protocol (MCP) servers and A2A Agents.
It federates MCP and REST services, providing unified discovery, auth, rate-limiting, observability, virtual servers, multi-transport protocols, plugins and an Admin UI: https://github.com/IBM/mcp-context-forge
As a Senior Python Developer, you will drive architectural decisions, own critical system components, and mentor the engineering team.
You will work closely with the Engineering Manager and CTO to shape the technical roadmap, establish best practices, and ensure engineering excellence across the codebase.
This role requires deep expertise in distributed systems, async Python, production architecture, and technical leadership.
Your Role And Responsibilities
Architectural Leadership
* Drive architectural decisions for critical system components and platform features
* Design scalable, resilient systems: federation, multi-region deployments, edge caching
* Establish technical standards and patterns across the codebase
* Lead technical design reviews and architecture discussions
* Collaborate with CTO on technology roadmap and strategic technical direction
* Evaluate and adopt new technologies: performance optimization, security enhancements
Technical Ownership
* Own critical system components: federation layer, authentication system, protocol implementation
* Design and implement complex distributed systems: Redis pub/sub, mDNS discovery, cross-cluster coordination
* Build production-grade features: circuit breakers, rate limiting, backpressure, graceful degradation
* Optimize system performance: profiling, caching strategies, database tuning, async patterns
* Lead incident response: debugging production issues, root cause analysis, post-mortems
* Plan and execute technical migrations: database changes, API versioning, infrastructure upgrades
Mentorship & Leadership
* Mentor developers through code reviews, pair programming, and design discussions
* Establish coding standards, best practices, and engineering processes
* Conduct knowledge sharing sessions and technical training
* Lead by example: code quality, testing, documentation, security practices
* Help engineers grow their technical skills and advance their careers
* Foster engineering culture of excellence, innovation, and continuous improvement
System Design & Architecture
* Design horizontally scalable systems with load balancing and auto-scaling
* Implement observability: comprehensive metrics, distributed tracing, structured logging
* Build security-first systems: threat modeling, zero-trust architecture, defense in depth
* Design for reliability: 99.9% uptime SLA, disaster recovery, data durability
* Plan capacity: performance testing, load testing, scalability analysis
* Create architecture documentation: ADRs, system diagrams, runbooks
Code Quality & Excellence
* Maintain and enforce 90%+ test coverage standards
* Implement advanced testing strategies: mutation testing, property-based testing, fuzzing
* Establish CI/CD best practices: automated quality gates, deployment strategies, rollback procedures
* Lead security initiatives: vulnerability management, security audits, compliance requirements
* Optimize developer experience: tooling, automation, feedback loops
**Required technical and professional expertise**
Python Development (8+ years)
* Several years of software development experience with deep Python expertise
* Master-level knowledge of Python internals: GIL, memory management, performance optimization
* Expert-level async Python: complex asyncio patterns, concurrency, coroutines, event loops
* Extensive experience with FastAPI or similar async frameworks in production at scale
* Advanced database expertise: SQLAlchemy 2.0, query optimization, schema design, sharding
Distributed Systems Architecture (5+ years)
* Several years designing and building distributed systems in production environments
* Deep understanding of distributed system patterns: consensus, replication, partitioning, federation
* Expert-level experience with Redis: advanced patterns, cluster mode, high availability
* Knowledge of CAP theorem, consistency models, distributed transactions
* Experience with multi-region deployments, data replication, conflict resolution
* Understanding of service mesh architectures and API gateway patterns
Protocol & API Design
* Expert-level REST API design with versioning and backward compatibility strategies
* Deep knowledge of JSON-RPC, gRPC, or similar RPC protocols
* Experience designing and implementing custom protocols
* Understanding of transport protocols: HTTP/2, SSE, WebSocket, gRPC, QUIC
* Experience with protocol parsing, serialization, and performance optimization
Security & Compliance
* Expert-level OAuth2/OIDC implementation and integration
* Deep understanding of JWT, token security, cryptographic signing
* Experience designing RBAC, ABAC, or policy-based authorization systems
* Knowledge of security architecture: zero-trust, defense in depth, threat modeling
* Experience with compliance: SOC2, GDPR, audit logging, data residency
* Understanding of cryptography: TLS, encryption at rest/in transit, key management
Production Operations & Observability
* Expert-level observability: OpenTelemetry, Prometheus, Grafana, Jaeger/Zipkin
* Deep understanding of SRE practices: SLOs, SLIs, error budgets, incident management
* Experience with production scaling: horizontal scaling, auto-scaling, load balancing
* Knowledge of performance optimization: profiling, bottleneck analysis, caching strategies
* Experience with chaos engineering and resilience testing
* Proficiency with containerization and orchestration: Docker, Kubernetes, Helm
Technical Leadership
* Proven track record of technical leadership on complex projects
* Experience mentoring and growing engineering teams
* Strong architectural thinking and system design capabilities
* Excellent written and verbal communication for technical audiences
* Ability to influence technical direction and drive consensus
* Experience working with Development Director/Product Owner/CTO/VP Eng on strategic technical decisions
**Preferred technical and professional experience**
Advanced Architecture & Performance
* Experience with high-throughput, low-latency systems (sub-millisecond p99)
* Knowledge of Rust or C++ for performance-critical components
* Understanding of Python C extensions and FFI (Foreign Function Interface)
* Experience with Python performance tools: cProfile, py-spy, memory profilers
* Knowledge of distributed caching strategies and CDN integration
AI/ML Infrastructure Expertise
* Extensive experience with AI infrastructure, LLM orchestration, or agentic systems
* Deep knowledge of OpenAI API, Anthropic API, LangChain, LlamaIndex, or similar frameworks
* Understanding of RAG (Retrieval-Augmented Generation) architectures
* Experience with prompt engineering, agent orchestration, or multi-agent systems
* Knowledge of AI observability: token tracking, cost optimization, model monitoring
Advanced Distributed Systems
* Experience with Kubernetes operators, custom controllers, or service meshes (Istio, Linkerd)
* Nice to have: Knowledge of GraphQL federation
* Understanding of edge computing, serverless architectures, or FaaS
* Experience with distributed consensus algorithms: Raft, Paxos
* Knowledge of stream processing: Kafka, Redis Streams, event sourcing
Security Leadership
* Experience leading security initiatives or serving as security champion
* Knowledge of penetration testing, security audits, or red team exercises
* Understanding of supply chain security and SBOM (Software Bill of Materials)
* Experience with secrets management: Vault, AWS Secrets Manager, encryption key rotation
* Knowledge of secure development lifecycle (SDL) implementation
Open Source & Thought Leadership
* Significant open-source contributions (maintainer or core contributor)
* Technical writing: blog posts, documentation, whitepapers
* Conference presentations or technical talks at major events
* Community leadership or developer advocacy experience
* Published articles or recognition in the developer community
IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer.
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics.
IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.