Google can train its search-specific AI products, like AI Overviews, on content across the web even when the publishers have chosen to opt out of training Google’s AI products, a vice-president of product at the company testified in court on Friday.
That’s because Google’s controls for publishers to opt out of AI training covers work by Google DeepMind, the company’s AI lab, said Eli Collins, a DeepMind vice president. Other organisations at the company can further train the models for their products.
“Once you take the Gemini” AI model “and put it inside the search org, the search org has the ability to train on the data that publishers had opted out of training, correct?” asked Diana Aguilar, a Department of Justice lawyer.
“Correct — for use in search,” Collins responded.
Google summarises answers to search queries using its AI at the top of results, which may result in users not clicking on independent websites for answers — a trend that’s hurting their revenue, website publishers have said. Google is using data from those same sites to generate the information powering AI answers.
Publishers can only decline having their data used in search AI if they opt out of being indexed for search, Google clarified. “Google has a separate way for publishers to manage their content in Search via the well-established robots.txt web standard,” a Google spokesperson said in a statement. Robots.txt is the file embedded within websites that tells bots made by AI companies and web indexers whether they can crawl a site.
Google called Collins to the witness stand as part of a three-week trial in federal court in Washington, held to determine how Google should restore competition to online search. Last year, US District Judge Amit Mehta ruled that the tech giant illegally monopolised the search market and is now weighing a set of changes proposed by antitrust enforcers to address its control.
The Justice Department is urging the court to force Google to sell its widely-used Chrome browser and to share key data it uses to generate search results. The agency is also asking Judge Mehta to bar Google from paying to be the default search engine on other apps and devices — a restriction that would extend to its AI offerings, including Gemini, which the government argues have benefited from the company’s unlawful dominance in search.
Aguilar, the DOJ lawyer, asked Collins whether he knew how much more additional data Google’s search organisation had access to beyond the content that Google DeepMind had trained its AI models on. When Collins answered that he did not know, Aguilar produced a document from August 26, 2024 titled, “Search GenAI<> Gemini v3.”
According to that document, Google removed 80 billion of 160 billion “tokens” — snippets of content — after filtering out the material that publishers had opted out of allowing Google to use for training its AI. The document also listed search “sessions data,” or data collected during a period of time in which a user interacted with Google Search, as well as YouTube videos, as data that could augment Google’s AI models.
After viewing the document, Mehta asked Collins for clarification. “The 80 billion out of 160 billion tokens, 50% is removed by publishers opting out?”
“That is correct,” Collins responded.
Later, Google’s lawyer sought to show that the tech company’s dominance of search did not prevent other AI companies from competing fiercely to provide accurate, real-time results within their chatbot services. If a user asked an AI chatbot for sports scores, for example, Collins testified that the chatbot would likely return the correct answer because the company that made the bot had a commercial arrangement with a sports score provider — it wouldn’t need to rely on a web index.
But Google has explored how its AI models could be much improved by the data it has already gathered through years of operating the world’s most popular search engine, testimony also showed. At another point during the cross-examination of Collins, the DOJ lawyer Aguilar showed the Google VP a briefing document meant for Demis Hassabis, chief executive officer of Google DeepMind.
In a comment, Hassabis had mused about training an unidentified Google AI model with a wealth of search data — including search rankings — to see how much more the AI model was improved by the data, compared to one that wasn’t trained with it.
“Did Google end up building a model using search data?” Aguilar asked Collins.
“Not that I’m aware,” he responded.
“But at least Mr. Hassabis has thought it would be interesting to look at?” she pressed.
“Yes,” Collins said.
That’s because Google’s controls for publishers to opt out of AI training covers work by Google DeepMind, the company’s AI lab, said Eli Collins, a DeepMind vice president. Other organisations at the company can further train the models for their products.
“Once you take the Gemini” AI model “and put it inside the search org, the search org has the ability to train on the data that publishers had opted out of training, correct?” asked Diana Aguilar, a Department of Justice lawyer.
“Correct — for use in search,” Collins responded.
Google summarises answers to search queries using its AI at the top of results, which may result in users not clicking on independent websites for answers — a trend that’s hurting their revenue, website publishers have said. Google is using data from those same sites to generate the information powering AI answers.
Publishers can only decline having their data used in search AI if they opt out of being indexed for search, Google clarified. “Google has a separate way for publishers to manage their content in Search via the well-established robots.txt web standard,” a Google spokesperson said in a statement. Robots.txt is the file embedded within websites that tells bots made by AI companies and web indexers whether they can crawl a site.
Google called Collins to the witness stand as part of a three-week trial in federal court in Washington, held to determine how Google should restore competition to online search. Last year, US District Judge Amit Mehta ruled that the tech giant illegally monopolised the search market and is now weighing a set of changes proposed by antitrust enforcers to address its control.
The Justice Department is urging the court to force Google to sell its widely-used Chrome browser and to share key data it uses to generate search results. The agency is also asking Judge Mehta to bar Google from paying to be the default search engine on other apps and devices — a restriction that would extend to its AI offerings, including Gemini, which the government argues have benefited from the company’s unlawful dominance in search.
Aguilar, the DOJ lawyer, asked Collins whether he knew how much more additional data Google’s search organisation had access to beyond the content that Google DeepMind had trained its AI models on. When Collins answered that he did not know, Aguilar produced a document from August 26, 2024 titled, “Search GenAI<> Gemini v3.”
According to that document, Google removed 80 billion of 160 billion “tokens” — snippets of content — after filtering out the material that publishers had opted out of allowing Google to use for training its AI. The document also listed search “sessions data,” or data collected during a period of time in which a user interacted with Google Search, as well as YouTube videos, as data that could augment Google’s AI models.
After viewing the document, Mehta asked Collins for clarification. “The 80 billion out of 160 billion tokens, 50% is removed by publishers opting out?”
“That is correct,” Collins responded.
Later, Google’s lawyer sought to show that the tech company’s dominance of search did not prevent other AI companies from competing fiercely to provide accurate, real-time results within their chatbot services. If a user asked an AI chatbot for sports scores, for example, Collins testified that the chatbot would likely return the correct answer because the company that made the bot had a commercial arrangement with a sports score provider — it wouldn’t need to rely on a web index.
But Google has explored how its AI models could be much improved by the data it has already gathered through years of operating the world’s most popular search engine, testimony also showed. At another point during the cross-examination of Collins, the DOJ lawyer Aguilar showed the Google VP a briefing document meant for Demis Hassabis, chief executive officer of Google DeepMind.
In a comment, Hassabis had mused about training an unidentified Google AI model with a wealth of search data — including search rankings — to see how much more the AI model was improved by the data, compared to one that wasn’t trained with it.
“Did Google end up building a model using search data?” Aguilar asked Collins.
“Not that I’m aware,” he responded.
“But at least Mr. Hassabis has thought it would be interesting to look at?” she pressed.
“Yes,” Collins said.
You may also like
ICRA downgrades Ola Electric's debt rating amid sluggish sales and profitability challenges
Bridget Jones author claims she was 'groped all the time' when she worked for the BBC
Warren Buffett's wisdom has inspired countless people, myself included: Tim Cook
Gurugram boy Anarghya Abhishek strikes gold in Budokan International Karate C'ship in Dubai
Using surplus Puri temple wood to carve deities at Digha against ethics, morality: Suvendu Adhikari