top of page

How Do You Spell AI? A Somatically Captioned Photo Essay

Harshadha Balasubramanian


Preface

It was a fresh and surprisingly sunny afternoon on a buzzing Bristol harbourside. I stood facing the water, which thrummed with what sounded like a gathering of ducks, oars, and chains. Eager to know more, I opened an image-recognition app on my phone, levelled the phone’s camera in front of my face with both hands, and took a picture.

‘Man with a snake’, announced the computerised voice.

Bemused, I raised my arms higher and took another picture. Now the description said, ‘Man with a fishing pole’.

Judging by the fact that no one around me was gasping or running, I assumed we were not witnessing a snake on show. Indeed, neither description might have been true. Rather than seeking to clarify the validity of the descriptions, I was struck by how adjusting my arms slightly had changed what AI recognised in the image.

I recount this story three years after it happened to introduce the following photo essay: a curated series of apparent misrecognitions committed by AI. However, I propose to reclaim these so-called AI glitches as moments of trickery on my part. I direct attention to the role of my situated photographer’s body in enabling recognition, and, at the same time, I ask how this situated photographer’s body can be harnessed to disrupt recognition. As computer vision and automated image recognition are widely applied—from describing the seen world for blind/visually impaired people like me to surveillance and self-driving cars—it is vital to interrogate how AI knows, what it should and should not know, and what our role is in shaping this knowledge.

My invitation to foreground the photographer’s body critiques both traditional approaches to image description and their automated counterparts. Images are often described in terms of what a viewer looking at them might see. These descriptions may be presented as text or be spoken out loud. Both practices have been varyingly used to provide visually impaired people access to visual content in workplaces, education, and entertainment. Crucially, through concentrating on what is seen, image descriptions have resolutely continued to centre the visual as an object of knowledge and a mode of knowledge production. Such descriptions authoritatively foreclose any attention given to the multisensory processes that produce images.

The essay traces automated recognition of a long white cane. Albeit not used by all visually impaired people, the cane is one of the most popular symbols associated with this community. Each attempted recognition here is explored through the set of bodily movements that it took to produce this outcome. Permitting AI to make correct recognitions requires those taking pictures to cultivate specific somatic practices, such as learning to hold a phone camera steady at a particular position in relation to an object. My production and curation of images was also helped by being indoors and hence being able to better control my surroundings, as well as having a stable internet connection that allowed AI to retrieve knowledge stored in servers. Such contextually shaped movements offer multisensory ways of experiencing what is seen in the photo.

My photographer’s body is also a site for disrupting AI’s attempted recognitions, and hence critically revealing how AI knows. For example, I deliberately encouraged confusion by invoking other contexts where I had experienced objects which resembled the object that I was asking AI to identify, such as leaning a cane up against the wall as if it was a mop. I also chose which misrecognitions should feature in this photo essay based on which ones appeared to contrast most significantly with my physical experience of where I was and what was around me. For once, here, the cane is not demonstrating how disabled folk can fit into normative wayfaring but is helping us to refuse normative approaches to recognition.

You are invited to embody, rather than just look at, the ensuing photos. You may do this by performing the somatic captions that accompany each photo; these are loose instructions for enacting the bodily movements made by the photographer when taking these pictures. The loosened instructions are meant to dissuade you from striving to accurately replicate the movements, openly defying the precision pursued by AI.

Remember, when you embody a blind photographer, this is not about becoming them. If you identify as non-blind, this is not about empathising with blind/visually impaired people. If you do identify as blind/visually impaired, perhaps you even own a cane, it is still going to be challenging to embody my movements. What you all feel and what your app describes could be different; it is vital to attend to how your bodymind in your specific context mediates these instructions and contributes to recognition. Please dwell in the moments of friction, as these are the differences that define us; these are the differences that seeing a photo will often help to skim over.

Despite what I have just said, you are welcome to see the photos, read the automatically generated Alt Text describing what they show, or search for human-made image descriptions. This journal’s editors have generously provided their own image descriptions in a separate note. I have requested that the editors do not include these descriptions as part of my contribution, so that I may raise some questions.

What does our insistence on the seen world suggest? Why do we rush to correct AI’s misrecognitions with humans’ descriptions? What happens when the responsibility for access is not individualised to the author but distributed?

I acknowledge that I, as a self-identifying blind person, have a distinct platform to voice these questions. This is why I am asking these questions, and by moving with me, you may ask them too. So, go on, open or download any image recognition app—the name does not matter—and get ready to move with me.

 


 


In a well-lit, carpeted, white-walled corridor, you stand at one end with a long white cane in one hand, and in the other you hold your smart phone featuring an automated image-recognition app. I tell you this is a blind woman’s cane, but what else could it be? Without code, can you and your environment trick AI into recognising objects that aren’t even there? Here are some spells to get you started: each photograph below has been entitled with the descriptions generated by AI, as well as being captioned with the photographer’s situated movements which have produced this image and the possible realities described.

 

‘Mop in a long hallway’:


Rest the cane against a white wall, one of the long edges in this rectangular corridor, at a slight slant. Stand back against the opposite wall with the cane ahead of you. Is this too conventional a photographer’s pose for the intended trickery? Hold your phone in the portrait position at about head height, letting your thumb graze the side of your cheek. Take picture.

 

‘Close-up of a pen’:


Fold back the handle of the six-part collapsible cane and place it upright. Kneel beside the cane, steadying it with one hand close to the ground. Will the phone’s camera see your hand and help AI recognise the cane? Raise your phone above your head, holding it in the portrait position between your head and the cane. Take picture.

 

‘White and black handle on a carpet’:


Fold the cane to form a triangle with a vertical tangent extending out of one corner. Hold this construction above the ground in front of you with one hand, twisting your feet to one side. Raise your phone in the landscape position directly above this construction. Will the phone’s camera see your feet and help AI recognise the cane? Take picture.


‘Metal rod with a screw on it’:


Hold the cane so the handle sticks into your chest and the rest extends straight out. Stand with one short edge of the rectangular corridor behind you. Will stretching your phone-arm over the extended cane mean the phone’s camera will see the room beyond and help AI recognise the cane? Raise your phone above the cane and ground beneath, keeping the phone in the landscape position and perpendicular to the cane. Take picture.


‘Metal pipes on a carpet’:


Fold up the cane and stand this construction upright on the ground. Bend over slightly. Will this remain upright for the time it takes you to photograph it? Hold your phone in the portrait position above the cane. Take picture.

 

‘Long white pipe on a white wall’:


Place the cane horizontally against a white wall. Stand beside the cane, steadying it with one hand gripping the handle. Is it straight and does that matter? Hold your phone in the portrait position in parallel with the cane. Take picture.

 

‘Pole with a black handle’:


Place the cane upright. Kneel beside the cane, steadying it with one hand close to the ground. Will the phone’s camera see your hand this time, and is AI going to recognise another pen? Raise your phone above your head, holding it in the portrait position between your head and the cane. Take picture.

 

‘Group of sticks on the floor’:


Fold up the cane into its six sections. Lie this construction on the ground next to a white wall, one of the long edges of the corridor. Stand back and angle your phone-arm diagonally towards the cane. Will the phone’s camera see the wall instead? Hold your phone in the portrait position. Take picture.

 

‘Black cord on a white surface’:


Lean the cane against the same white wall so that it creates a steep diagonal with the bottom tip extending halfway across the width of the corridor. Stand with one shoulder against the wall where the cane is leaning and face the diagonal. Are you standing too close? Hold your phone in the portrait position perpendicular to the cane above where the diagonal feels at its steepest. Take picture.


‘A cane in a hallway’:


Do you feel like you know how to trick AI? Be honest, did you have to see the photographs or read the descriptions to know, or was it enough to embody me, the photographer, through performing my spells? Now lift the cane a few inches off the ground in front of one shoulder, keeping it upright. Stand with the cane-arm stretched ahead. Will the suspension trick AI? Hold your phone near the middle of the cane in the portrait position in line with its vertical placement. Take picture.



Further Encounters

Kasnitz, D. ‘The Politics of Disability Performativity: An Autoethnography’. Current Anthropology, vol. 61(S21), 2020, pp. 16­-25.

MacDougall, D. The Corporeal Image: Film, Ethnography, and the Senses. Princeton UP, 2005.

Playable City. ‘Playable City Sandbox: How (Not) to Get Hit by a Self-driving Car’. n.d., www.playablecity.com/projects/playable-city-sandbox-how-not-to/.

Rizzo Naudi, Joe. Black Cane Diary. n.d., blackcanediary.substack.com/.

Schalk, S. D. Bodyminds Reimagined: (Dis)Ability, Race, and Gender in Black Women’s Speculative Fiction. Duke UP, 2018.

Platform: Journal of Theatre and Performing Arts, Vol.17, No. 2, 2025. ISSN: 1751-0171

bottom of page