Researchers at EPFL have discovered that large language models, predominantly trained on English text, appear to internally utilize English, even when prompted in a different language. This finding could have significant implications for linguistic and cultural bias as AI continues to permeate our daily lives.
Large language models (LLMs) such as Open AI’s ChatGPT and Google’s Gemini have captivated the world with their seemingly natural speech understanding and response capabilities.
These LLMs can interact in any language, but their training is primarily based on English text parameters in the hundreds of billions. It has been speculated that these models process internally in English and only translate to the target language at the final stage. This hypothesis lacked concrete evidence until now.
Probing Llama
Researchers from the Data Science Laboratory (DLAB) at EPFL’s School of Computer and Communication Sciences investigated the open-source LLM, Llama-2, to ascertain which languages were employed at various stages of the computational process.
“Large language models are designed to predict the subsequent word. They achieve this by associating each word with a numerical vector, essentially a multi-dimensional data point. For instance, the word ‘the’ will always correspond to the same fixed numerical coordinates,” elaborated Professor Robert West, the head of DLAB.
“These models link together around 80 layers of identical computational blocks, each transforming one word-representing vector into another. At the end of these 80 transformations, the output is a vector representing the next word. The number of computations is determined by the number of computational block layers—the more computations performed, the more potent the model and the higher the likelihood of the next word being accurate.”
In their paper, “Do Llamas Work in English? On the Latent Language of Multilingual Transformers,” available on the pre-print server arXiv, West and his team forced the model to respond after each layer while predicting the next word, instead of allowing it to complete its 80-layer calculations. This approach enabled them to see the word the model would predict at that stage. They set up various tasks, such as asking the model to translate a series of French words into Chinese.
the model knew it was supposed to translate the French word into Chinese. Ideally, the model should assign a 100% probability to the Chinese word, but when we forced it to make predictions before the final layer, we found that it most often predicted the English translation of the French word, even though English was not involved in this task.
“We provided a French word, followed by its Chinese translation, another French word and its Chinese translation, and so on, so that the model knew it was supposed to translate the French word into Chinese. Ideally, the model should assign a 100% probability to the Chinese word, but when we forced it to make predictions before the final layer, we found that it most often predicted the English translation of the French word, even though English was not involved in this task. It was only in the last four to five layers that Chinese became more likely than English,” West stated.
From Words to Ideas
A straightforward hypothesis might suggest that the model translates the entire input into English and only translates into the target language at the end. However, the researchers’ data analysis led to a more intriguing theory.
In the first phase of calculations, neither word receives any probability, leading the researchers to believe that the model is addressing input issues. In the second phase, where English prevails, the researchers propose that the model operates in an abstract semantic space. Here, it’s not reasoning about individual words but about other types of representations that are more concept-oriented, universal across languages, and more representative of the world. This is crucial because to accurately predict the next word, the model needs extensive world knowledge, which can be achieved through this concept representation.
“We hypothesize that this world representation in terms of concepts is biased towards English, which is logical considering these models were exposed to approximately 90% English training data. They map input words from a superficial word space into a deeper concept space where there are representations of how these concepts relate to each other in the world—and the concepts are represented similarly to English words, rather than the corresponding words in the actual input language,” West explained.
Monoculture and Bias
An essential question that emerges from this English dominance is, ‘does it matter’? The researchers believe it does. Extensive research indicates that the structures present in language shape our perception of reality and that the words we use are deeply intertwined with our worldview. West suggests that we need to begin researching the psychology of language models, treating them as humans, and subjecting them to behavioral tests and bias assessments in different languages.
An essential question that emerges from this English dominance is, ‘does it matter’? The researchers believe it does. Extensive research indicates that the structures present in language shape our perception of reality and that the words we use are deeply intertwined with our worldview
“This research has struck a chord as people are becoming increasingly concerned about potential monoculture issues. Given that the models perform better in English, many researchers are now exploring the idea of inputting English content and translating it back to the desired language. While this might work from an engineering perspective, I would argue that we lose a lot of subtlety because what cannot be expressed in English will not be expressed,” West concluded.
Comments are closed