We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results

Principal Applied Scientist

Microsoft
United States, Texas, Irving
7000 State Highway 161 (Show on map)
Aug 20, 2025
OverviewBe at the forefront of AI evaluation by joining the Copilot Offline Evaluation Platform team and help us deliver the platform that makes Copilot innovation fast, reliable, and regression-free.As a Principal Applied Scientist, you will help transform how Copilot features are evaluated and improved. Your team will deliver end-to-end experimentation, evaluation, and insights to Copilot engineers, PMs, and fellow scientists. You'll lead the development of a robust data generation platform that simulates realistic user behaviors, curate representative datasets, and develop comprehensive query sets and evaluation tooling. You'll work on scalable pipelines that support offline evaluations for Copilot Search, BizChat, Connectors, and Agents (DAs). This opportunity will allow you to dive deep into Copilot technologies, shape how AI quality is measured at scale, build critical skills in the AI-era, and rapidly grow your career.Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
ResponsibilitiesDrive the technical strategy for offline evaluation, ensuring alignment with product goals and scientific rigor in methodology, metrics, and automation.Lead the design and implementation of robust evaluation pipelines that simulate real-world usage and reflect diverse user intents and preferences.Develop and own critical quality metrics that capture user satisfaction, model fidelity, and regression risks-serving as trusted signals for product decisions.Guide the creation of high-fidelity synthetic data using LLMs to simulate complex user interactions, enabling scenario coverage at scale.Perform deep analysis of evaluation results to surface actionable insights, diagnose weaknesses, and influence prioritization across Copilot canvases.Mentor scientists and partner with engineering and PM leads to integrate insights into experimentation, product loops, and model iteration cycles.Establish best practices and influence standards through documentation, internal forums, and contributions to the broader applied science community (e.g., talks, publications, cross-org collaboration).
Applied = 0

(web-5cf844c5d-d7k5c)