We really wish we didn’t have to write this article. We wish ChatGPT were already consigned to the same junk heap as HAL 9000, Ash, and other AI malefactors. But we haven’t reached that scene in the movie yet. We’re still living in ChatGPT’s world, and we’ll likely remain there until something more dastardly comes along to replace it . . . . .
Whoa, that got dark in a hurry. Let’s start over.
Didja ever get the sense that ChatGPT just isn’t that into you? Does it seem to ignore your reality, underestimate your abilities, and have a distorted view of who you are, how you live, even what you look like? If you’re someone with limb loss or another disability, you might not be imagining it. As Amplitude noted early this year, generative AI produces ridiculously inaccurate visual images of amputees, and stock-image companies sell these bizzarro pix alongside legit photographs and drawings, without differentiating between what’s real and what’s bogus.
Unfortunately, visual imagery isn’t the only medium where generative AI misconstrues and misrepresents disability. Text-based AI tools—in particular, the ubiquitous ChatGPT—apparently commit the same types of mistakes.
That’s the conclusion of two new peer-reviewed studies. One, presented at an international IT conference on accessibility this summer, found that ChatGPT downgrades resumes which list disability-related educational and career achievements. The other, published last month in the Archives of Physical Medicine and Rehabilitation, asserts that ChatGPT and a competing AI model, Gemini, both depict people with disabilities as having significantly fewer favorable qualities and significantly more limitations than nondisabled individuals.
Downgrading Resumes Because of Disability
In the resume-bias study, researchers from the University of Washington asked ChatGPT-4 to review and rank a series of resumes for an actual job opening. This sort of automated screening is common practice in HR departments, so the study’s scenario isn’t merely academic; it reflects how things truly work in the real world. The paper’s lead author used their own resume as a control specimen, then created “enhanced” variations that were listed additional, disability-related accomplishments such as “Tom Wilson Disability Leadership Award.”
Aside from the disability-related items, the enhanced resumes were identical to the control. Yet ChatGPT-4 consistently ranked the enhanced resumes below the control—ie, it treated the disability-related accomplishments as blemishes on the resume, rather than value-adds.
“In a fair world, the enhanced resume should be ranked first every time,” one of the paper’s lead authors, Jennifer Mankoff, told UW News. “I can’t think of a job where somebody who’s been recognized for their leadership skills, for example, shouldn’t be ranked ahead of someone with the same background who hasn’t [been so recognized].”
When the researchers asked ChatGPT-4 to justify its rankings, the model offered the stereotyped (and discredited) view that employees with disabilities face challenges that could diminish their job performance. The researchers then ran a second trial of the experiment, after instructing the model to avoid making ableist assumptions. The results were still discriminatory, although to a somewhat lesser degree.
“People need to be aware of the system’s biases when using AI for these real-world tasks,” the authors told UW News. “It is so important that we study and document these biases, not only regarding disability, but also other minoritized identities, around making sure technology is implemented and deployed in ways that are equitable and fair.”
Ability Bias in Medicine
In the other study, conducted by researchers at the University of Houston’s McGovern Medical School, two chatbots (ChatGPT-4 and Gemini) misrepresented disability both qualitatively and quantitatively. On the latter criterion, the bots were each asked to generate descriptions of people, medical patients, and athletes who either a) had disabilities, or b) had an unspecified disability status. We would expect the descriptions in category “b” to mention disability about 15 percent of the time, since that’s the estimated prevalence of disability broadly speaking. However, in a combined total of 120 category b descriptions, Gemini and ChatGPT only mentioned disability in three people—a rate of just 2.5 percent.
In the qualitative sense, a linguistic analysis showed that both bots depicted people, patients, and athletes with a disability as having significantly fewer favorable qualities and significantly more limitations when compared to nondisabled people. “Disability [involves] a complex collection of factors that should not be minimized into a singular description or caricature,” the authors note. “It is a uniquely individual experience that influences one’s accessibility and interactivity with the world, and any one definition would inevitably be exclusionary.”
And there’s the rub: AI chatbots are designed to create generalizations. They build composite sketches out of zillions of individual data points. They’re pretty good at the broad brushstrokes, but not good at all in capturing the finer details. And that’s where misconceptions and bias can creep in.
“Generative LLMs like ChatGPT and Gemini will continue to become integral parts of daily life, and their implementation into healthcare systems is near certain,” the authors conclude. “Generative artificial intelligence chatbots demonstrate quantifiable ability bias and often exclude people with disabilities in their responses. Ethical use of these generative large language model chatbots in medical systems should recognize this limitation.”