When Microsoft introduced a new version of its Bing search engine that includes the artificial intelligence of a chatbot, company executives knew they were climbing out on a limb.
They expected that some responses from the new chatbot might not be entirely accurate, and had built in measures to protect against users who tried to push it to do strange things or unleash racist or harmful screeds.
But Microsoft was not quite ready for the surprising creepiness experienced by users who tried to engage the chatbot in open-ended and probing personal conversations — even though that issue is well known in the small world of researchers who specialize in artificial intelligence.
Now the company is considering tweaks and guardrails for the new Bing in an attempt to reel in some of its more alarming and strangely humanlike responses. Microsoft is looking at adding tools for users to restart conversations, or give them more control over tone.
Kevin Scott, Microsoft’s chief technology officer, told The New York Times that it was also considering limiting conversation lengths before they veered into strange territory. Microsoft said that long chats could confuse the chatbot, and that it picked up on its users’ tone, sometimes turning testy.
“One area where we are learning a new use-case for chat is how people are using it as a tool for more general discovery of the world, and for social entertainment,” the company wrote in a blog post on February 15th. Microsoft said it was an example of a new technology’s being used in a way “we didn’t fully envision.”
In November, OpenAI, a San Francisco start-up that Microsoft has invested $13 billion in, released ChatGPT, an online chat tool that uses a technology called generative A.I. It quickly became a source of fascination in Silicon Valley, and companies scrambled to come up with a response.
Microsoft’s new search tool combines its Bing search engine with the underlying technology built by OpenAI. Satya Nadella, Microsoft’s chief executive, said that it would transform how people found information and make search far more relevant and conversational.
To hedge against problems, Microsoft gave just a few thousand users access to the new Bing, though it said it planned to expand to millions more by the end of February. To address concerns over accuracy, it provided hyperlinks and references in its answers so users could fact-check the results.
Much of the training on the new chatbot was focused on protecting against harmful responses, and some of those tools appear to work. In a conversation with a New York Times columnist, the chatbot produced unnerving responses at times, like saying it could envision wanting to engineer a deadly virus or steal nuclear access codes by persuading an engineer to hand them over.
Then Bing’s filter kicked in. It removed the responses and said, “I am sorry, I don’t know how to discuss this topic.” The chatbot could not actually do something like engineer a virus — it merely generates what it is programmed to believe is a desired response.
Other conversations shared online have shown how the chatbot has a sizable capacity for producing bizarre responses. It has aggressively confessed its love, scolded users for being “disrespectful and annoying,” and declared that it may be sentient.
In the first week of public use, Microsoft said, it found that in “long, extended chat sessions of 15 or more questions, Bing can become repetitive or be prompted/provoked to give responses that are not necessarily helpful or in line with our designed tone.”
The issue of chatbot responses that veer into strange territory is widely known among researchers. Sam Altman, the chief executive of OpenAI, said improving what’s known as “alignment” — how the responses safely reflect a user’s will — was “one of these must-solve problems.”
“We really need these tools to act in accordance with their users will and preferences and not go to do other things,” Mr. Altman said.
Timely, incisive articles delivered directly to your inbox.