Large language models (LLMs) are revolutionizing internet search by changing how users conduct online research. Traditional search engines (e.g., Google) process key terms and return links to related pages, which the searcher explores one by one. They click through the pages until they find what they are looking for.
In contrast, LLMs respond directly to natural language questions, delivering conversational responses that summarize information from multiple sources. For example, instead of Googling “best nonstick skillet” and clicking through multiple results, searchers can ask “What is the best nonstick skillet I can buy for under $40?” and receive an answer that summarizes relevant information from popular sites. Depending on how the LLM is trained, the response may summarize information from a limited training data set or from across the internet.
Chirag Shah studies these types of transformative developments in search, machine learning, artificial intelligence and ethics. Shah teaches in the University of Washington Information School’s Master of Science in Information Management program. He is also a founding director of the InfoSeeking Lab and founding co-director of the Center for Responsibility in AI Systems & Experiences (RAISE). His numerous books and over 200 peer-reviewed publications center on advancing effective artificial intelligence within a transparent, ethical and fair framework.
Shah recently sat for a wide-ranging interview covering artificial intelligence, LLM-powered search summaries and the logistical and ethical challenges facing AI as it advances and assumes an ever-growing prominence in the ways businesses and institutions operate. In the conversation below, he shares insights on how large language models are reshaping search experiences, what that means for traditional search practices and where AI still has room to grow.
Develop information management skills with a UW MSIM
Schedule an application walkthrough today
How does an LLM improve information retrieval and user experience?
An LLM eliminates some steps, making search more efficient. It can also generate very personalized responses. For example, let’s say you wanted to explain the process of photosynthesis. You could use a search engine to research it, and there are many resources online.
But I suppose what intrigues me in this query is not what photosynthesis is but rather how to explain it to someone else — say, a 5-year-old. I can ask the LLM-powered search engine: “How would you explain this to a 5-year-old?” LLM search can shape answers in that way; it can provide a response tailored to a specific audience.
Will LLM search overviews make traditional search obsolete?
No, there is still room for traditional search. You know, before we had graphical user interface (GUI), we had command-line interface. GUI has not completely supplanted command-line interface. True, most people don’t use it or even know how to use it, but power users do, developers do, and administrators do. There are still use cases for command-line interface. It may seem outdated, but it provides the power and functionality needed in many applications. Has radio gone away because of television? No.
Traditional search may persist in the same way. Why? First, it’s not always feasible for LLM search to produce the result the searcher wants. Also, people don’t search only to find an answer. Search is a way people learn. They discover new things; search is even arguably a form of entertainment. There are use cases for traditional search.
I don’t see LLM-powered search as a better search. It’s a different kind of search. It has a niche and a different set of applications, so it can operate there. Some of the things that people have used search engines for can transition to LLM-powered results, but there are many cases where traditional search is still going to be cheap, effective and even faster than LLM.
So far, we’ve focused on the general public, but what about specialized users? When lawyers and paralegals prepare for a case, they can’t simply write a summary. Maybe they’re not searching on Google; they’re using LexisNexis and similar tools. The point is, they still have to conduct an exhaustive search. Power users — not just legal professionals but also health-care professionals, patent researchers and many other professions — have to make sure they don’t miss anything and that their results are highly accurate.
Then there are the hallucinations, biases and all those other AI-related issues.
Will AI ever eliminate hallucinations?
I think hallucinations, sample bias and other AI-related problems will always be there. We make progress with every new model, and there’s always a claim for improvement on hallucination, improvement in robustness and fact-checking, but the way the models are built, there will always be issues.
Even if they can reduce the error rate to 5 percent or less, you still don’t know which 5 percent is erroneous. That’s fine if you’re hunting for fun facts on the internet, but can you really risk being wrong 5 percent of the time in critical situations? When it impacts your job, your family, your community?
That’s why the focus has shifted from having the LLM solve the problem to alternative strategies. Retrieval-augmented generation (RAG), for instance, is meant to provide a technical solution. It goes beyond training data and asks the LLM to retrieve information from online sources — the way a traditional search engine does — and then generate an answer from those retrieved documents. It mimics what we humans would do. Google’s AI Overview is an example. It’s not perfect, and it’s certainly not the only technical solution.
RAG solves one kind of problem, but it introduces a different kind of problem. It helps with hallucination, sure. But there’s a popular example from Google AI overview where someone asked, “How do I stop the toppings from sliding off my pizza?” The response suggested using glue! That answer is not hallucinated; it’s something someone posted as a joke response on a Reddit thread. Humans can tell it’s a joke, but the LLM-powered search doesn’t have common sense or a sense of humor.
When you fix one problem, you often create another.
Is bias more common in LLM-powered search results than in traditional search?
There are multiple reasons why bias-related issues are more pronounced in LLM search. One of them is that an LLM is heavily dependent on the training data that’s used. We know that in a lot of cases where we’ve seen issues of bias, whether it’s gender-based or racial or otherwise, the number one reason for that is the training data.
Some of it is correctable. Eight, 10 years ago, when they were training image recognition, bias resulted because the training data had many more images of white men than of people of color or women. So you can balance that. Of course, once you’re talking about huge amounts of data, that sort of filtering presents a challenge. In the past, we’ve just hoped the sheer volume would surmount bias, but that’s wishful thinking. That’s why the focus is shifting to post-training and putting up guardrails. They do reinforcement-learning-based fine-tuning to teach the LLM how not to be racist or sexist. It’s an imperfect process with its own set of issues.
LLMs exhibit another form of bias. They are trained to be polite to us. If you ask them for an opinion, they won’t tell you you’re dead wrong; they want to please you. The result is a sort of echo chamber where your own biases are amplified through the support of the LLM. And because you’re interacting in a natural language, it gives a false impression that somebody is agreeing with you. Not just somebody, but the whole internet! The whole world is agreeing with you! This can perpetuate biases. It’s technically not stemming from the LLM itself, but because of the way it works, because of the way it’s trained, it actually perpetuates those biases.
There are also gender- and language-related biases. For example, men and women utilize different hedging behaviors, which are the way they express certainty. A man is more likely to say “that looks right,” while a woman is more likely to say “I think that looks right.” It means the same thing, but the LLM doesn’t see that; it interprets the woman’s response as expressing a lower degree of certainty. That can have negative consequences, such as when LLMs are used by HR departments in hiring; the LLM will favor the men whom it perceives as more confident.
We’ve also looked across dozens of languages and cultures. If you ask the same question in a different language or in a different cultural context, you get a different answer. LLMs pick up those biases and amplify them. And the big challenge here is that most end users are not necessarily aware of all the bias issues. These things are big black boxes that grow more complex with each iteration, and it is very hard to get an LLM to explain its reasoning, its rationale for its answers.
And LLMs are always very confident, even when they’re very wrong. That creates a vicious cycle.
What is explainability?
Explainability has existed ever since machine learning, because a lot of machine learning systems are mathematical models designed to make decisions that impact humans. For obvious reasons, we want to know: How did you make this decision?
The classic example is when you apply for a loan and your loan is rejected. If a human made that decision, you could ask them and maybe get an answer. If a model made the decision, how would it explain? There’s been a lot of work done to figure out how to have these models generate explanations that humans will accept.
When it comes to LLMs, it becomes increasingly challenging — almost impossible — to come up with that kind of an answer, because an LLM is essentially a huge model with billions and sometimes trillions of parameters. There is no written rule in the LLM that says, when this happens, do this. There is no simple “this” that the LLM can tell the human. So maybe the LLM comes up with an explanation, but is it the explanation? If you don’t trust the decision, why would you trust the explanation? They come from the exact same source. It’s like if you ask a liar, “Are you lying?” they will answer no. I’m not saying LLMs are lying, but they pose the same fundamental problem.
This happened to a lawyer named Steven A. Schwartz. He used ChatGPT for help with legal citations and actually asked ChatGPT, “Is this citation correct?” Sometimes ChatGPT correctly identified hallucinated citations (and apologized!), and sometimes it insisted fabricated citations were real. It’s a complex issue that raises the question of whether you might not be better served by traditional search.
What ethical responsibilities accompany AI use?
I co-run the Center for Responsibility in AI Systems and Experiences (RAISE) here at the UW. When it comes to AI, everyone has responsibility: entities that are building AI products, selling AI products, and regulating AI products. It’s not something to just put out and see what happens, even though that has been happening, especially in the United States. We lack a good framework for asking for responsibility and testing it.
Here’s where we should be focusing: When you’re building models, what data sources are you using? Did you source them ethically? Do you have the rights to the content? There are a lot of lawsuits now because model builders used content without permission.
Next comes the deployment of the models. There’s an ethical responsibility to use resources wisely because we have finite resources, and these models can be very energy-intensive. In terms of energy resources and the environment, we have to be careful. Are we exploiting cheap labor in communities and countries with poor living conditions and bad human rights records? All these factors matter.
We need to take responsibility for when these models misbehave. Who is responsible when a user seeks psychological help from an LLM and it says it’s OK for them to take their life? We are currently not even meeting the lowest bar, and it’s very frustrating.
Government regulations can help. There have been some good efforts at the state level. The European Union has some good frameworks, and we’ve looked at those for guidance. Australia has some good frameworks.
Finally, the user needs to take some responsibility. We need to understand this technology if we’re going to use it, and know its shortcomings and biases. People need to educate themselves. Maybe 10 years ago, we were talking about data literacy. You don’t need to be a data scientist, but you have to understand some basics of data. We’re at the same place today with AI. Regardless of who you are, you need some minimal education about AI.
Gain the skills to navigate a changing tech landscape with an MSIM
Chirag Shah is one of many scholars exploring and forging the future of information management at the University of Washington. When you earn your Master of Science in Information Science from the UW, you learn from a renowned faculty of thought leaders and industry experts. You also benefit from a flexible and customizable curriculum as well as hands-on projects that promote both theoretical knowledge and practical experience. The program instills the leadership, technical, and strategic skills needed to drive impact in a rapidly evolving tech landscape.
Ready to future-proof your career? Contact an enrollment advisor today to learn how UW’s MSIM online or residential degree tracks can help you lead through the power of information.
