Image from apple.com

The infrastructures behind Siri

How the experience with Siri would differ if you had less privilege or different physical infrastructure.

Jiyoung Ohn
5 min readNov 10, 2020

--

“Hey Siri, can you please stop the alarm?”.

My morning starts with the sentence. The voice assistant became a vital part of my daily life by helping me easily complete mundane tasks such as setting up a timer and checking the weather. Furthermore, Siri is expanding its presence from the cellphone to the entire home by creating its ecosystem using different devices(HomePod, iPad, MacBook, iPhone). Siri became a mundane technology for most people, however, we should acknowledge that not everyone has the same ability, environment, and infrastructure to use the technology. For this article, I would like to examine what infrastructures are embodied in Siri and see how the absence of the infrastructures can adversely impact the user experience.

Apple Siri devices [Image source — Voicebot.ai]

I would like to start by examining the physical infrastructure behind Siri. To understand the inherent culture and value, it is necessary to look at how Siri is created in the first place. The speech recognition technology was originally developed by an American multinational company called Nuance Communication whose headquarter is in the United States. Also, Siri was initially created by an American non-profit research institute called SRI International with its funding coming from the US Defense Advanced Research Projects Agency’s (DARPA). Since the research and early development process mostly took place in the United States, the technology naturally embodied the Western value and culture in its inherent infrastructure.

The initial version of Siri used voice-recognition technology developed by Nuance Communications [Image source- The Motley Fool]

One of the important digital infrastructures that are impacted by the embodied physical infrastructure is the language and accent system. Siri is originally trained with American English and works well with typical standard English accents. As the initial training data input was mostly breezy West coast accents, Siri often fails to understand unique accents such as regional dialects and lilts. For example, as a person with a Korean-English accent, I frequently fail to communicate with Siri. It is even more frustrating because it has been already 3 years since I started to ‘train’ Siri with my voice and accent. The accumulated experience of failure made me less confident about my accent and try to mimic the standard English for better results. I believe the struggle is not only my experience. There are numerous accents across the world among English speakers. The embodied infrastructure and its preference for certain accents can make certain users less confident in using the feature, eventually leaving them behind the technology.

Siri’s accent options

The infrastructure that embodied Western culture also impacted the identity and personality of Siri. Siri’s default gender is a woman and she developed a personality that is submissive to defer questions and commands to higher. This is a typical stereotype of Western culture, which does not necessarily fit all the culture and values across the world. For example, Laos is a traditionally a women-centered society where it is usual for women to work and men to stay home and support their family. I believe if Siri was initially developed in Laos, there is a possibility that the default gender was male and had a different personality with a more active and independent tone. The gender and personality of Siri can force other cultures to embody the same gender stereotype and also reinforce the bias existing in society.

Furthermore, the cultural background of Siri also impacted the physical infrastructure that it embodies. Siri keeps expanding its boundary to create a connected ‘home experience’. The concept of ‘home’ here is based on the typical Western house environment, where four family members live together within a quiet environment, which is a perfect condition to communicate with Siri. However, not all people have the same privileged home environment. Let’s think of Siri that lives in a HomePod. In the structure of western houses, it is natural to call Siri that is in the living room from the kitchen and ask to set up a timer. However, for a house where multiple households sharing the space, it might not be easy to do the same interaction, publicly commanding Siri. Also, in a house where an electronic supply is limited, it must be impossible to turn on the HomePod all the time, which makes it harder for Siri to be activated anytime needed. From these instances, Siri can certainly create an excluding experience for some users who have different physical infrastructures.

Apple’s HomeKit and Siri’s Smart home imagines a typical western household.

Lastly, the social infrastructure behind Siri is tuned to serve the primary user group who doesn’t have any trouble seeing the interface and exercise fluent speech. To be more specific, Siri uses lots of visual cues to show the states, such as listening, talking, and error. For someone who is not able to see well, the interaction gets harder because they can’t understand the visual feedbacks that Siri provides. Also, Siri works best with people who are fluent in speaking. This is clear when we look at the limited amount of time that Siri waits for the user to respond. As it doesn’t wait long for users to finish speaking, people with deaf-mutism or specific language impairment would have limited usage of the technology.

To sum up, I believe the embodied infrastructure of Siri can have unintended consequences when it interacts with diverse cultures and user groups. The accumulated negative experience of certain user groups can result in intensifying the tech-based polarization in the world. Therefore, I believe it is crucial to consider various cultural contexts or environments in the development process and have more flexibility and openness to fit into multiple infrastructures.

--

--