Overview
Ask Sage (BigBearAI Company)is the leading Generative AI platform that augments the velocity of government and commercial teams with dozens of use cases from coding to cybersecurity to acquisition to data analysis and much more. Our FedRAMP High and DoD IL5 accredited cutting-edge technology enables teams to focus on strategic initiatives while we take care of the heavy lifting. We are seeking a highly skilled and experiencedPrincipal Kubernetes Platform Engineer.This critical role involves privileged access to our cloud instances, Kubernetes clusters, and supporting platform services, including environments operating under FedRAMP High and Department of Defense requirements.The Principal Kubernetes Platform Engineer(Multi-Cloud) will be accountable for the reliability, security, scalability, and operational excellence of our Kubernetes estate across Azure Government (AKS), AWS (EKS), and Google Cloud (GKE) as needed.You will serve as the organization's technical authority for Kubernetes administration and platform engineering, setting standards for cluster architecture, network policy, identity and access, workload isolation, secrets management, observability, release engineering, and incident response. The ideal candidate combines deep hands-on Kubernetes expertise with disciplined operational execution, strong security instincts, and the ability to automate everything (GitOps/Infrastructure-as-Code) while partnering effectively with security, engineering, and leadership. As a key member of our team, you will improve platform uptime, reduce operational toil, accelerate delivery, and ensure our container platform remains compliant and defensible under audit.
What you will do
Key Responsibilities:
- Own day-to-day and strategic administration of Kubernetes clusters across multiple cloud environments (AKS/EKS/GKE), including Azure Government enclaves where applicable.
- Design, build, secure, and operate highly available Kubernetes platform architectures (multi-zone, upgrade-safe, disaster recovery-ready).
- Establish and enforce cluster standards: namespaces/tenancy, RBAC, Pod Security standards, admission control, network segmentation, and workload isolation.
- Implement and maintain end-to-end platform security controls: image provenance, vulnerability management, runtime protection, secrets management, and certificate lifecycle.
- Build and mature GitOps/CI/CD patterns for Kubernetes (e.g., Flux/Argo), ensuring reliable, repeatable deployments with strong auditability.
- Manage Kubernetes lifecycle operations: version upgrades, node pool strategy, capacity planning, add-on management, and cluster hardening.
- Define and operate observability for clusters and workloads: logging, metrics, traces, alerting, SLOs/SLIs, and actionable runbooks.
- Proactively ensure the highest levels of platform availability and performance; lead root-cause analysis and drive permanent corrective actions.
- Maintain security, backup, and redundancy strategies for etcd (where applicable), persistent storage, cluster state, and critical platform services.
- Secure and maintain the stack to fix cybersecurity vulnerabilities, CVEs, misconfigurations, and supply-chain risks; coordinate remediation timelines with stakeholders.
- Provide 2nd and 3rd level support for Kubernetes and containerized workloads, including incident response participation and on-call support as required.
- Partner with application teams to set best practices for containerization, resource requests/limits, health probes, service discovery, ingress, and release safety.
- Develop and maintain automation to reduce manual intervention (IaC, policy-as-code, auto-remediation, self-service workflows, and automated compliance evidence).
- Liaise with cloud vendors and internal stakeholders for platform problem resolution and architectural guidance.
- Maintain our environment to comply with FedRAMP High requirements and support regular reporting and audit evidence collection.
- Uphold and enforce Ask Sage's compliance, privacy, and security policies, ensuring adherence to all relevant regulations and standards.
- Conduct regular audits of Kubernetes configurations and platform controls; recommend and implement enhancements aligned to benchmarks and risk posture.
What you need to have
- Minimum of 7 years of experience in infrastructure/platform engineering, including at least 4 years of deep, hands-on Kubernetes administration in production.
- Clearance: TS/SCI required
- Demonstrated expertise operating Kubernetes across multiple cloud providers (AKS + EKS and/or GKE).
- Strong knowledge of Kubernetes internals and critical subsystems: scheduling, networking (CNI), DNS, ingress, storage (CSI), RBAC, admission control, and upgrades.
- Strong security background in container and Kubernetes hardening (e.g., policy controls, least privilege, network policies, secrets handling, supply chain security).
- Proficiency with Infrastructure-as-Code and automation (e.g., Terraform, Ansible) and scripting (e.g., Bash, Python, Go).
- Experience with observability tooling and operational maturity (monitoring/alerting, incident response, SLOs).
- Familiarity with compliance-driven environments and producing audit-ready evidence (FedRAMP/DoD environments a plus).
- Relevant certifications preferred (one or more): CKA/CKS, Azure Solutions Architect, AWS Solutions Architect, Security+, CISSP.
What we'd like you to have
- Demonstrated expertise operating Kubernetes across multiple cloud providers (AKS + EKS and/or GKE); Azure Government experience strongly preferred.
About BigBear.ai
BigBear.ai is a leading provider of AI-powered decision intelligence solutions for national security, supply chain management, and digital identity. Customers and partners rely on Bigbear.ai's predictive analytics capabilities in highly complex, distributed, mission-based operating environments. Headquartered in McLean, Virginia, BigBear.ai is a public company traded on the NYSE under the symbol BBAI. For more information, visit https://bigbear.ai/ and follow BigBear.ai on LinkedIn: @BigBear.ai and X: @BigBearai. BigBear.ai is an Equal opportunity employer all protected groups, including protected veterans and individuals with disabilities.
|