customers

How Wayfair cut ML model costs by 90% (twice!) with Cursor

With Cursor handling experiment execution, Wayfair compressed months of ML research into days. Five researchers tested over 110 model variants and cut tag-validation model costs by 94%.

6 min read

Wayfair's Applied Research team uses Cursor to compress months of machine learning and applied AI research into days. By late 2025, researchers were running up to 20+ agents in parallel. This enabled a team of five to test 110 model variants in a four-day experimentation sprint and reduce inference costs for a core e-commerce catalog enrichment workflow by 94%. In March 2026, the team repeated the same playbook with the latest models in Cursor, cutting costs by another 90%.

Cursor has changed how ML research operates at Wayfair. Wayfair's researchers drive the model improvements: crafting hypotheses, interpreting results, and refining the strongest ideas. Cursor handles the implementation: building experiments, wiring them into the testing framework, and measuring results.

Validating product attribute data against the world's largest homegoods catalog

Every product in Wayfair's catalog is described by structured "tags" describing materials, dimensions, color, and other attributes. Over 47,000 distinct attribute tags power search, filtering, recommendations, product placement, and advertising for tens of millions of products.

Wayfair's Applied AI team built a validation model that audits each tag against images, descriptions, and customer reviews on the product page. The model was accurate, but too expensive to run across large swaths of Wayfair's massive product catalog.

Our goal was to make the model cost-effective enough to run across one of the world's largest home goods catalogs.

Guillermo Mosse
Senior Machine Learning Scientist, Wayfair

To realize this goal, the team needed to explore a large design space including different LLMs, input pre-processing strategies, prompts, output structures, and evaluation methods. Manually implementing hundreds of combinations would have taken months.

Instead, Wayfair used Cursor to automate and parallelize the experimentation loop. In December 2025, the team ran a four-day experimentation sprint to progress towards their cost reduction goals. With Cursor handling the implementation layer, five researchers built and tested 110 substantively distinct model variations. The winning architecture cut inference costs by 94% while improving model precision, and went into production as Wayfair's tag-validation baseline.

Wayfair researchers validating product attribute data with CursorWayfair researchers validating product attribute data with Cursor

The slow part of research is building and scoring each experiment by hand. We automated that loop and let Cursor implement and execute each experiment, so what would have been months of work fit into four days.

Guillermo Mosse
Senior Machine Learning Scientist, Wayfair

Delegating experiment execution to Cursor

Before building model variants, the team standardized how Cursor would execute and measure experiments: every variant ran on the same test dataset and same evaluation benchmark to score performance. With the testing and evaluation framework locked as an automated workflow in Cursor, researchers could focus entirely on exploring experiment design: changing models, rewriting prompts, restructuring outputs, or rethinking how images were selected.

"There were many degrees of freedom: models, prompts, output structure, image selection. With the Cursor automations in place, I focused on exploring the design space," said Guillermo Mosse, a senior machine learning scientist. "I'd describe an idea, sometimes using voice mode to talk for 5 minutes straight, and Cursor would spin up the variant, run the eval, and publish results. The framework handled the data sampling, evaluation, and metric reporting that made comparisons trustworthy."

Cursor changed the bottleneck from 'How long will this take to build?' to 'What is the next idea worth testing?' That is a much better place for a scientist to spend their attention.

Omer Lang
Senior Machine Learning Scientist, Wayfair

This allowed researchers to go from idea to a live experiment in less than 30 minutes.

Wayfair researchers delegating experiment execution to CursorWayfair researchers delegating experiment execution to Cursor

Researchers spent most of their time brainstorming what to try next, reviewing results, and deciding which ideas were worth another turn. Cursor wrote and ran each variant, surfacing the strongest ones for us to review.

Guillermo Mosse
Senior Machine Learning Scientist, Wayfair

In March 2026, Wayfair ran another experimentation sprint, this time benchmarking against the productionized December model as the new baseline. With the framework now mature, junior engineers with no prior exposure to tag validation were shipping novel model variants on day one. Researchers ran 140+ new experiments and layered genetic-algorithm searches on top of the strongest candidates for final optimization. The result: another 90% cost reduction.

Wayfair's March experimentation sprint resultsWayfair's March experimentation sprint results

Cursor as a foundation for agent-first ML research

A few capabilities mattered most for how Wayfair ran experiments:

  • Scaled agent parallelization: Researchers often ran 20+ Cursor agents in parallel during the experimentation sprint. "Running many variants at once in Cursor was straightforward and easy. This made our four-day sprint realistic," said Mosse.
  • Cross-platform surfaces: Some researchers worked primarily in the Cursor desktop app while others worked in the Cursor CLI. When the desktop group needed direct low-level control, they could open a terminal or files directly in Cursor.
  • Cloud agents: Researchers wanted experiments to keep running when they stepped away from their laptops. "Normally, shutting your laptop interrupts the experiment. Cursor allows me to commute, jump into meetings, or whiteboard ideas while their cloud agents keep running, allowing us to run experiments 24/7," said Mosse.
  • Access to every model: Researchers reached for different models for different tasks. Having access to all the best models in one tool made it easy for Wayfair to iterate.

Nick Coleman, a senior machine learning science manager, started using Cursor after trying several other agents. "Cursor was the easiest to get going with, and you have access to all the best models," he said. "The things I want to control manually, like managing git branches or jumping into files, are easy to access directly in Cursor without having to jump between tools."

Scaling Cursor across Wayfair

Cursor is now prevalent across the Applied Research organization, well beyond the ML team driving catalog enrichment. Researchers are building and exchanging internal repos of skills for ML experimentation, further accelerating the pace of development. "I've been managing several open-ended research projects in Cursor. I define the spec, set the cost guardrails, and feed in the ideas worth trying. The agents run for days while I steer as needed," said Mosse.

This new way of doing research, compressing months of exploration into days, is what we want to keep pushing.

Guillermo Mosse
Senior Machine Learning Scientist, Wayfair

Wayfair researchers are also encouraging other stakeholders across the company to use agents, including partners with no coding experience. "My advice is to push it beyond the limits of what you think is possible," said Coleman. "Start by telling it what you want to accomplish, and then just keep pushing the boundary." You can read more about Wayfair's work on their research blog.


If you're using Cursor to accelerate ML research or scale experimentation across your team, please reach out to our team to start a Cursor trial.