Product Manager, Agent Harness

Product Management · Full-time · San Francisco; New York

Apply

Our mission is to automate coding. The first step in our journey is to build the best tool for professional programmers, using a combination of inventive research, design, and engineering. Our organization is very flat, and our team is small and talent dense. We particularly like people who are truth-seeking, passionate, and creative. We enjoy spirited debate, crazy ideas, and shipping code.

About the Role

The Agent Harness is what makes Cursor's agents actually work. It determines how agents decompose tasks into subtasks, how they interact with the file system and terminal, how they handle failures and retries, and how developers observe and steer what's happening. When an agent gets stuck, loops, or hallucinates, the harness is why—and the harness is how you fix it.

As a Product Manager for the Agent Harness, you will own this framework. Agent quality is improving rapidly—we shipped Composer 2, our own frontier coding model, and are training agents through real-time RL on user data. Your job is to turn those research advances into product that developers can feel.

This is not a role where you write specs and hand them off. You'll be reading agent traces, analyzing failure modes, designing evaluation frameworks, and making judgment calls about what an agent should and shouldn't attempt. You'll work at the boundary between research and product, where the roadmap is shaped by empirical results as much as customer feedback.

Example projects include...

  • Owning the agent planning and execution framework: how agents decompose tasks, decide what tools to use, and recover when a step fails. Balancing autonomy with predictability.

  • Designing how developers observe and steer agents: real-time progress, guardrails, the ability to redirect mid-task. The experience should build trust without requiring micromanagement.

  • Building evaluation and benchmarking systems: defining what "good" means for agent quality—task completion rate, error recovery, hallucination frequency—and building the harnesses to measure it. These measurements drive engineering and research priorities.

  • Analyzing agent traces at scale: identifying where agents get stuck, loop, hallucinate, or take unproductive paths, and turning those patterns into concrete improvements.

  • Defining the primitives for agent extensibility: how agents use tools, access codebase context, call external services via MCPs and plugins on the Cursor Marketplace, and how developers customize agent behavior through rules and constraints.

  • Improving the default Cursor agent experience (the “Auto” model setting): making smart model choices based on user needs, model capabilities, and cost appetite.

  • Shaping multi-agent coordination: how subagents share context and avoid conflicts when executing in parallel across files and systems. This matters more as developers spin up fleets of agents simultaneously.

You may be a fit if

  • You have built or evaluated AI agents, LLM applications, or ML-powered developer tools.

  • You're deeply technical. You're comfortable reading code, analyzing traces, and reasoning about system behavior at a low level.

  • You have strong intuition for evaluation and measurement. You know how to define metrics that capture quality, not just activity.

  • You can move between the big picture and the details—from "what should agents be capable of in six months?" to "why did this agent fail on this specific task?"

  • You're comfortable in a research-adjacent environment where the roadmap is shaped by empirical results, not just customer requests.

  • You have experience with reinforcement learning, agent frameworks, or AI evaluation—either as a practitioner or working closely with researchers.

  • You thrive in ambiguous, fast-moving environments and enjoy making hard tradeoffs with incomplete information.

#LI-DNI


Apply for this role

U.S. EQUAL EMPLOYMENT OPPORTUNITY INFORMATION   (Completion is voluntary and will not subject you to adverse treatment)

Anysphere, Inc. provides equal employment opportunities to applicants and employees without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, or disability.

We invite all applicants to voluntarily self-identify their race, ethnicity, and gender. Submission of the information on this form is strictly voluntary and refusal to provide it will not subject you to any adverse treatment. Information obtained will be retained in a confidential file and separate from personnel records. This information may only be used in accordance with the provision of applicable federal laws, executive orders, and regulations. If you want more information about any of the sections, please check with a company representative.

SELF-IDENTIFICATION OF VETERAN STATUS  (Completion is voluntary and will not subject you to adverse treatment)

If you believe that you belong to any of the following categories of protected veterans, please indicate by making the appropriate selection

  • Disabled veteran – A veteran who served on active duty in the U.S. military and is entitled to disability compensation (or who but for the receipt of military retired pay would be entitled to disability compensation) under laws administered by the Secretary of Veterans Affairs, or was discharged or released from active duty because of a service-connected disability

  • Recently separated veteran – A veteran separated during the three-year period beginning on the date of the veteran's discharge or release from active duty in the U.S military, ground, naval, or air service

  • Active duty wartime or campaign badge veteran – A veteran who served on active duty in the U.S. military during a war, or in a campaign or expedition for which a campaign badge was authorized under the laws administered by the Department of Defense

  • Armed forces service medal veteran - Armed forces service medal veteran – A veteran who, while serving on active duty in the U.S. military ground, naval, or air service, participated in a United States military operation for which an Armed Forces service medal was awarded pursuant to Executive Order 12985 (61 Fed. Reg. 1209).