Apple Publishes Paper on its Advances in GenAI
Apple recently published a paper on the work it is doing to improve how we interact with large language models (LLMs).
In particular, “A Multimodal Approach to Device-Directed Speech Detection with Large Language Models” investigates areas that could bring improvements to voice activated applications, making them more intuitive and natural to use.
In the current systems, voice-activated assistants are first triggered by a standard phrase before the user can provide a command. This paper discusses Apple’s research into trying to make virtual assistants easier to use by “an acoustic false-trigger-mitigation (FTM) approach for on-device device-directed speech detection” – in other words, removing the need for the initial trigger phrase.
The researchers focused on three main areas. The first relates to analyzing acoustic information to better “distinguish audio that is directed towards the device from background speech.”
The second area of focus was feeding outputs from automatic speech recognition (ASR) systems to feed into the LLM.
And finally, they adapted LLMs to accept combined speech and text prompts. Researchers determined that this multi-modal input for LLMs improves the overall accuracy of the model to understand the input prompt “over text-only and audio-only models of up to 39% and 61%,” respectively.
Apple regularly publishes research dedicated to machine learning, as well as hosting and participating in industry events to advance the public understanding of AI and promote additional research in the space.
While research of this type does not directly advertise how Apple would use the technology to drive new features in its products, it does demonstrate Apple’s commitment to the development of AI.
Apple in general has been less vocal about its investments related to GenAI when compared to other big tech companies, and has rolled out fewer GenAI features than some of its main competitors.
Apple CEO Tim Cook, however, has promised investors that it will introduce new GenAI capabilities in 2024 and reports that Apple and Google are in discussions to incorporate Google’s Gemini into Apple products have circulated recently.