
Behind the Scenes of Ray-Ban Meta: An Exclusive Interview with Kenan Jia, Head of Meta Smart Glasses Product
Kenan Jia | Head of Product Assembling of the Meta AI/AR Glasses Team Luo Yihang | Founder and CEO of SCilier &GenAI Assembling bits
On November 26, at the offline event "AI Robotics and the Future of Wearables" held by GenAI Assembling Assembling with the Silos at Menlo Park in Silicon Valley, the founder and CEO of the Silos, Luo Yihang, held a conversation with the head of product of Meta's AI/AR Eyewear team, Kenan Jia.
Kenan is the product lead for the popular Ray-Ban Meta smart glasses and is leading the development of Meta's next generation of smart glasses. In this conversation titled "Creating a New AI Hardware: The Art of Trade-off," we talk about the behind-the-scenes story behind the birth of Ray-Ban Meta, the trade-offs behind an AI hardware, and thoughts about the future.
The following is a transcript of the conversation:
Ray-Ban Meta: From smart glasses to Everyday companion
I want to ask a question from a user's perspective - how long will you wear Ray-Ban Meta smart glasses? What is the most common use case for you?
Kenan: Hi, I'm Kenan, Product Manager of Meta's Smart Glasses team, currently responsible for the development of the next generation of smart glasses. Today I am very happy to share the information about Ray-Ban Meta.
For me, the most important use scenario is to take photos and videos from the first perspective, especially in travel and activities, which is essential for me to travel, so that I can free my hands and record beautiful moments at any time without taking out my mobile phone. For example, when you go to Disneyland in Los Angeles, you can't take your phone on the roller coaster because it's dangerous. But with Ray-Ban Meta sunglasses, I was able to record the whole thing, which was great. In addition, I also went to two concerts this year, no longer need to hold up the phone to watch the small screen video, can enjoy the performance while recording the wonderful moments. Similarly, when traveling with friends, such as mountain biking in Hawaii, I use it to listen to music and videos, which allows me to stay immersed in the place and record beautiful moments. This is one of the main goals of our Ray-Ban Meta products.
In addition, I am now using its AI capabilities more and more. When traveling, for example, I don't have to read long instructions in a museum, I can just ask the Meta AI specific questions that interest me. The glasses can also translate in real time. I was on a business trip in Europe, and the glasses helped me quickly translate menus and road signs, which was very convenient. As a result, my reliance on AI capabilities is also increasing.
I can give an example of myself using the Ray-Ban Meta AI function. I was standing in front of a MiG-3 at the Aviation Museum in Seattle in the middle of last month when I said, "Hey Meta, take a picture." Then he asked, "Tell me the backstory of this plane." A few seconds later, a voice from the frame of the glasses told me that it was a MiG-3, built in 1940 and used in the Soviet war with Germany, and added a history of its development. I think this is a good example to show the application of multimodal Llama model.
I'm very interested in the integration of AI with wearables, can you explain in detail how this technology works on glasses? We know that some AI functions rely on the cloud to run, while others use edge AI. Can you be more specific about how they work?
Jakhonan: Sure, the architecture of the system is actually more interesting than many people think. For example, when Thomas says "Hey Meta" and wants to learn about the plane, the process is divided into three parts:
Glasses themselves
Meta View Companion App for mobile phone
Cloud server
When you say "Hey Meta," the wakeword model runs locally on the device, activating AI capabilities and understanding your query through voice recognition. If it's just a simple command like "take a picture," the processing can be done directly on the glasses.
For more complex multimodal tasks, this is done in the cloud. At this time, the glasses connect to the supporting application of the mobile phone through Bluetooth, and the mobile phone sends the query to the cloud server for processing through WiFi or cellular network.
The Llama 70B model is invoked to perform tasks including knowledge base retrieval, answer generation, and privacy and security filtering. Once the processing is complete, the response is returned to the glasses via the phone and played back as audio.
There is a lot of discussion in the field about whether the processing takes place on the device or in the cloud. For a device in the form of glasses, there are a lot of tradeoffs. Due to the small size and light weight of the device, but also limited by power and heat, putting complex tasks on the server side is the best solution. The advantage of this architecture is that the task is passed to the phone via Bluetooth, and then the cloud completes the main calculation and response, which can greatly reduce the power consumption of the device, while ensuring high-quality answer output.
The challenge is optimizing latency, because no one wants to wait too long. With server-side processing, users get the best quality of answers from large models. The hybrid architecture of the device and server ensures that simple instructions are completed quickly and reliably.
For example, real-time speech translation functions run locally on the device. When you speak to me in French or Spanish, the translator does a real-time voice translation directly on the glasses. For more complex multimodal queries, the best performance is achieved through cloud processing. This architecture requires the team to put a lot of work into hardware design, system optimization, power and heat management, and AI capabilities development to ensure a high-quality, low-latency, and reliable response experience when users ask simple questions.
Luo Yi Hang: OK, let's continue to delve into the combination of AI and wearable devices. The use cases you mentioned are travel photos and concert videos. I feel that these scenarios and translation capabilities do not require much complex AI support, although they require certain AI features, but do not rely on the reasoning power of large language models. I think Ray-Ban Meta was released last October, right?
Kenan: Yes, the second generation Ray-Ban Meta was released last October. The first generation of Ray-Ban Stories was launched in 2021.
The AI feature was only added in April this year. From your personal experience or user surveys, how have usage scenarios or applications changed over the past year or so?
Kenan: That's a good question. After the launch of AI features, we did see strong growth and positive adoption by users. Regarding Ray-Ban Meta, it's worth mentioning that when the first generation of Stories was released in 2021, our positioning was primarily camera + audio glasses, with features focused on first-view shooting, listening to music, and making and receiving calls. These features are very popular. The Ray-Ban Meta product is unique in that it is not only a pair of smart glasses with a large language model and AI functions, its design is also very stylish, many users like its appearance, and it has very practical shooting and music playback functions.
The audio feature is one of our most used and retained features. Many people find that after wearing this pair of glasses, they do not need to frequently put on or take off AirPods, which can replace the use of Bluetooth headphones, which greatly improves convenience.
After the introduction of AI features in April this year, these features continue to improve as new models are optimized. Ray-Ban Meta as a complete standalone hardware product is different from other AI software products on the market. If people need to carry it every day, in addition to mobile phones and keys, it increases the psychological burden. Therefore, users choose this device either because they really like it, or because it brings great value.
AI capabilities are showing significant value. Some features are more traditional, such as weather checking, setting timers or hands-free photography and recording. Others show more interesting new trends, such as real-time translation or multimodal functionalities - like identifying plants or quickly querying information. These are some of the early emerging application scenarios.
With the continuous improvement of the model and optimization for glasses, Ray-Ban Meta shows great potential. Because it is always worn on the face, it can see what the user sees, hear what the user hears, and deliver the audio directly to the ear, and the user does not need to take out the phone to operate. For example, when in a museum or tourist attraction, you can simply ask the AI a question directly and get an immediate answer, which is very practical.
At the Meta Connect conference in September, we announced that we will launch some real-time question-answering AI features based on the Llama model in the future, and I think this will be one of the most promising categories.
What's more, these AI features can be combined with other shooting and audio features that users already enjoy. Compared to some pure AI devices, Ray-Ban Meta is a successful multi-function integrated product. Devices that focus only on specific AI capabilities may feel new to try, but end up easily forgotten and become a drawer piece. The reason for Ray-Ban Meta's success is that it combines stylish design and versatility to make users really want to use it frequently.
The tech trade-offs behind smart glasses
I was curious to hear that integrating the Llama model into smart glasses was challenging during development. Can you share a specific difficulty the team encountered? Especially when it comes to bringing a large language model into a device and implementing multimodal capabilities, this is a really complex process. How do you work with research and development teams to overcome these challenges?
Kenan: We have a very strong AI team, both in terms of model research and smart glasses integration. If we look at the architecture again, including glasses, mobile and server side, and how to do voice response directly, some of the challenges include how to make sure the response is optimized for glasses and audio feedback. If you type a question into ChatGPT, you usually get a long response. However, the lengthy reply content of LLM does not fit the shape of glasses, such as speech synthesis (Tomson S) reply is very long, need to listen to two or three minutes, which is not ideal for users. So we put a lot of work into making sure we optimize our summary responses and provide the most relevant immediate responses.
Another key challenge is latency. The operation of the system involves many links, including the processing of the glasses, data transmission, and the calculation of the cloud server. Our team dismantles each step step by step, improving overall latency and reliability. Because if the Bluetooth connection is suddenly disconnected, the user may not know what happened and will just assume that Meta AI is not working.
Even after launch, we were thinking about how to take the AI experience from "good" to "great." For example, you mentioned that now you have to say "Hey Meta, look and tell me more about this plant", that's not natural enough, you might want to just say "Hey Meta, what is this plant?" . So we're investing a lot in natural interactions, making sure people can talk directly to the AI. Now you don't need to say "look and", you can direct the question and the system will ask if you want to take a picture, or if it is clear enough to do it directly.
In addition, we have developed multiple rounds of continuous conversations. In a museum, users may not want to repeat "Hey Meta" every time they ask a question. Now, when you wear glasses, the LED indicator shows that the system is listening for a few seconds, so you can ask follow-up questions.
Overall, our focus is on ensuring that AI runs reliably, fast, and of excellent quality on glasses. We've been investing in making use cases more valuable, interactions more natural, and supporting more languages, and it's challenging to internationalize.
Now that you mention prioritization, latency, reliability, and quality of response, we can further explore the importance of these trade-offs and balances. I think they are critical to the future of AI devices, especially the next generation of AI glasses.
My view is that glasses may be one of the most promising AI devices of the future. The reason is that the human head contains the most important sensory organs - the eyes, ears and mouth. In essence, the head itself is like a carbon-based multimodal system, and the glasses can be a bridge for its natural interaction with silicon-based multimodal AI. Glasses have a history of more than 700 years, and users have a high acceptance of wearing glasses and will not feel uncomfortable. Moreover, it can well integrate human perception and AI intelligence to achieve collaborative work.
As a result, these trade-offs and balances become particularly critical. Because when we humans are watching, listening, and expressing, the scene can get complicated. So how are these features prioritized when developing a product like AI glasses? Which are the most important? What's next? What's the third? If you had to set a baseline, what would it be?
Kenan: I wish there was a simple answer to explain these trade-offs, but the reality is very complicated. If you ask users what they want from Ray-Ban Meta or AR glasses, they'll usually say they want a lighter, smaller device, but at the same time better performance, longer battery life, and higher image quality. However, these requirements are often contradictory, and it is not possible to meet all needs at the same time.
In developing such consumer-grade electronic devices, especially those worn on the face, we need to be very careful about where we draw the line. For example, this trade-off is evident in the camera design of the second-generation Ray-Ban Meta.
The image quality and FOV of the second generation are significantly improved compared to the first generation, from 5 million pixels to 12 million pixels, while adding many post-processing and image optimization features. But unlike the dual camera design of the first generation, the second generation only comes with a 12-megapixel camera located on the left side.
Initially we had two cameras in order to get in-depth information and maybe take some creative photos, but the actual usage may not be high. In contrast, users are more concerned about photo quality and FOV. So we decided to focus on a single-camera design. Dual cameras can indeed have different lenses, zoom and wider field of view, but on such a small device, the battery is on the right side, and if you add another camera here, it will take up mechanical space, squeeze battery capacity, and also affect power consumption.
Power consumption and memory are also key issues for such small devices. By removing one of the cameras on the right, but boosting the camera's pixels and FOV, we free up about 10% of the mechanical space, which is critical to improving battery life, reducing power consumption, and optimizing heat dissipation. At the same time, the design is still able to meet users' needs for sharing on social platforms such as Instagram or Facebook Stories, especially when shooting casual scenes.
This case illustrates the need for a comprehensive trade-off between performance, size, weight, ergonomics, power consumption and heat dissipation to achieve the best user experience. These factors are the most critical considerations in our design.
Luo Yihang: Isn't AI the most critical?
Kenan: AI is of course very important, but we were talking more about hardware design decisions for cameras. Going back to the device-side versus server-side example, for AI, we also have to find a balance between latency, response speed, and power consumption. Different use cases will have different priorities, and the specific trade-offs will also change.
Luo Yihang: OK, can you share some specific cases or stories?
Kenan: I can use the example of camera architecture to illustrate how teams deal with these complex tradeoffs. As a product team, our role is not to make direct decisions, but to clearly define the problem. For example, we want to improve camera and image quality, but there are limitations in terms of mechanical structure, battery space, power consumption, latency, heat dissipation, and cost. Our task is to list all possible solutions, analyze their performance from different dimensions, and then work with the teams to evaluate them.
You'll end up with a heat-map-like table showing the pros and cons of each option. No solution is usually optimal in all aspects. For example, while the single-camera design is not the most cutting-edge in terms of improving image quality, it performs better in other key aspects, such as power consumption and battery space utilization.
In this case, we need the team to decide together what the priorities are. In this specific case, we believe that improving image quality is important, but power consumption and battery life must be tightly controlled. While some teams may prefer other options, they need to understand the logic and accept the final decision. There is usually a lot of debate, but in the end we take the principle of "disagree but obey" because the schedule and manufacturing schedule do not allow us to postpone the decision indefinitely. Once the architecture is settled, the software team needs to make further optimizations based on this hardware. As a result, the process is highly collaborative and requires a lot of discussion and trade-offs.
Mr. Locke: What challenges have you faced when working with supply chains or manufacturers?
Kenan: This brand new device can't just be put into production by defining use cases and specifications. Challenges in manufacturing include how to ensure reliability and quality in actual production, as well as achieving performance requirements at target yields. We need to work closely with contract manufacturers around the world to deeply understand the manufacturing process and find room for improvement.
From AI glasses to AR glasses, it is often necessary to develop completely new modules and achieve large-scale production. This is a very exciting stage, but also full of challenges. Especially for smart glasses, not only the manufacturing and back-end operations are complex, but also the marketing is a huge problem.
Traditional glasses are usually sold through optometrists or eyewear stores, while consumer electronics rely on Best Buy, Amazon and other channels. How to educate users about the optician process? How to display products in retail stores? This requires close collaboration with our channel partners. We have partnered with EssilorLuxottica, which owns the Ray-Ban brand, and they have extensive experience in the eyewear industry and a large sales channel. It was a huge learning process for us because it was a combination of fashion and technology - traditional glasses plus AI and smart features. Through this collaboration, we are not only learning how to make consumers understand and accept this new form of device, but also exploring how to better promote such a fusion of technology and fashion products in the market.
Game-breaking thinking of AI hardware: Reference, competition, and future scenarios
Let's talk about our previous experience. Before the first generation of Ray-Ban Meta, Meta invested a lot in VR devices, such as VR headsets. What lessons learned from the development of VR devices can be applied to the development of AR glasses and AI glasses?
Kenan: I think there are some common experiences, but the device constraints are very different, which leads to different decisions. What they all have in common is that they are complex integrated systems with different modules, such as display, audio, and mechanical structures. But VR is a larger headset that is usually used at home, while glasses are lighter and can be worn outside.
We share the same goal of miniaturization. On the Quest side, we tried to make it smaller; The same is true for AI glasses and future AR glasses, which want to make it smaller, cheaper, and better. But because of different restrictions, such as tolerance in terms of size, heat dissipation, power consumption, the decision will be very different.
In addition, the display modules of AR glasses and VR headsets are also different. They have some things in common, need to consider 2D/3D content and system interaction, because you are not using a mouse or touch screen, but using gesture tracking, eye tracking, or EMG (electromyography) technology that we released this year, but because of the different user experience, the requirements of the display module and the optimization direction of the hardware and software are also different.
Of course, we have many shared teams internally, such as the display and optics teams that share experiences with each other on AR and VR projects, and there are great synergies in system design, interaction modeling and manufacturing, and marketing. But when it comes to products, use cases, and trade-offs, the differences are huge. I don't see a complete convergence of product directions in these areas for the foreseeable future. So we can learn from each other, but the actual development considerations are very different.
Mr. Luo: Is there any experience you can learn from smartphone manufacturers? Or is the experience not worth looking at?
Kenan: There are really interesting lessons to be learned from mobile phones or other traditional consumer electronics. I have previously been involved in the product development of traditional smart speakers/smart screens, such as Meta's Portal smart screen. What we have learned from other forms of products, such as smartwatches or voice assistants or smart screens, is that in order for people to want to carry an additional standalone device, it has to provide enough value. Because people can do so many things with their phones now, with AI capabilities, cameras, etc., you need to be really different. You either have to target a niche market, such as a dedicated camera for creators, or you have to think about a combination of AI and other use cases for the general market, or people won't remember to use it again.
You mentioned that single-function AI devices are often difficult for users to keep using. Can we talk about other AI wearables or portable devices? For example, there are devices like AI badges that can record videos or conference sounds, and AI hardware like the Rabbit R1 has attracted attention for a while. More recently, there are meeting recording devices that can be used in conjunction with a mobile phone to summarize a meeting or discussion for the user. These devices focus on solving a single problem, how do you see them? Especially in comparison, Ray-Ban Meta is neither a single-function device nor just an AI device.
Kenan: I don't think the problem is just one function. The challenge for many AI devices is to find a core use case that is valuable enough that the user remembers to use the device. Now we see a lot of concept-first devices with strong marketing claims, but the key question is: Why do users choose this device instead of continuing to use their phone?
The phone can already run most of the AI models you need, and it's multimodal, with a lot of optimizations for those features. For Ray-Ban Meta, our positioning is not limited to an AI device, but rather a multifunctional device that provides an integrated experience with shooting, audio and AI capabilities.
You mentioned the Rabbit R1 and others like it, and even if you do one thing well, the challenge now is: How do you make sure you're not just a feature? If Apple or other phone manufacturers later integrate this feature, will users still have a reason to choose these devices?
I've recently purchased a few non-AI consumer electronics devices that I'm impressed with and use regularly, even though they're single-purpose. To name two examples:
reMarkable: An e-ink notepad from Norway. Although expensive, it has sold more than a million units. It is focused on providing users with a minimalist digital note-taking experience that meets the needs of those who don't like taking notes on the iPad. It's a great example of digital minimalism, and people love it because it takes "simplicity" to the extreme.
• Freewrite: An electronic ink typewriter. It helps me to focus on writing in the moment and away from distractions, which fits my need for focus.
The success of these devices lies in the fact that they identify the target market and are deeply optimized for the specific needs of specific users. They are not simple AI concepts or ideas, but truly tailored products for specific scenarios and users. Even if the function is single, as long as it is extreme, people will be willing to pay for it.
This is also the dilemma faced by many AI-powered devices - they do not do the best in a single function, nor do they provide an attractive enough multi-function value package. This leaves them in an awkward middle ground where users ask, "Why should I bring an extra device?"
But this is a very interesting phase, and people will experiment with different forms and improve different use cases. We'll see which ones succeed. I think glasses are one of the most interesting forms, but people will keep experimenting and we'll see what works and what doesn't.
Let's talk about the future of AI or AR glasses. I'm not sure if you know how many brands are making these glasses in China right now? Can you guess a number?
Mr. Jacobs: I don't know the numbers, but I've seen a lot of brands.
Luo Yihang: Is it five, 10 or 20? Which is closer?
Kenan: It must be more than 20.
Luo Yihang: What do you think of this hot AI/AR glasses market? What is the core competitiveness of companies making such products? Just like Tesla has disrupted the market with autonomous driving and electric vehicles, China also has at least 5-6 very competitive brands that are doing well. What do you think of this competition?
Kenan: I think we are in a very interesting stage. This category is exploding rapidly and gaining great interest from consumers, brands and manufacturers. This is a good thing, because as I said, people will try different forms, different eyewear designs and different usage scenarios. But I think one of the challenges is that no matter where the company is or how big or small it is, this is an integrated device that needs to be optimized overall from hardware to software to marketing.
If you only make software, it will be difficult to optimize for this new form. For example, the architecture of glasses is not stable, and the layout of the audio interface, the position and performance of the camera, etc. are all changing. Similarly, if you only focus on the hardware without controlling the model side, it will be difficult to solve the challenges of system and hardware connection, such as the matching of model quality and hardware performance. This is why we see many large companies entering this field - they have the ability to optimize from hardware to software to channel integration, and enhance user experience through strong brands and sales networks.
While many brands are launching products similar to the Ray-Ban Meta architecture, to make the user experience truly great, you need to address the details in all of these different categories. This is good for the industry and we will continue to move forward.
Luo Yihang: Indeed, this requires the perfect combination of hardware, software and marketing. Specific to Meta's Llama model, what are Meta's core advantages in these three aspects? The competition in the hardware part is very fierce, but there seems to be little difference in everyone's capabilities in software and Llama models. What about other aspects? How would you rate Meta's hardware capabilities? After all, Meta isn't exactly known for its hardware, right?
Kenan: That’s a great question. I can talk about how the partnership with EssilorLuxottica helps us, especially in this area. As you said, we're a software company and we're great at models. But at the same time, we have been doing hardware for a long time, Reality Labs has been around for 10 years, and I have worked there for more than 6 years. So we have a lot of experience in manufacturing consumer electronics.
It's great to work with EssilorLuxottica and Ray-Ban, especially on the industrial design side, how to make fashionable products that people really love and create iconic designs. There is a lot we don't know about eyewear manufacturing, and they have a wealth of experience and are the largest traditional eyewear manufacturer in the world. Together we look at how this impacts color, materials, finishes, and hardware mechanical design. We also learned a lot from them.
In terms of channels, we have accumulated experience in consumer electronics channels through Quest VR. But the field of glasses is very different. It is a medical device worn on the face. We learned a lot from Essilor Luxottica about lens design, gradient treatments, coating optimization, channel sales, and brand and design strategies. I think we're constantly making progress, reinventing ourselves especially in glasses and doing a good job in VR as well. It's going to be an interesting reinvention.
Luo Yihang: It sounds like it would make sense for Meta to acquire a lens brand to achieve these goals, right?
Kenan: In fact, we have established a long-term strategic cooperation relationship with EssilorLuxottica and signed a long-term strategic cooperation agreement. EssilorLuxottica owns many brands, including not only Ray-Ban, but also many other well-known brands. Such cooperation brings us huge advantages.
Building an iconic brand doesn’t happen overnight. For example, some users love the classic Ray-Ban design and now want to continue enjoying the charm of the brand by upgrading to smart glasses. This partnership helps us achieve this very well and lays a solid foundation for our smart glasses development.
Luo Yihang: Next is probably my last question. Looking to the future, what ideas do you have in terms of AI hardware products that excite you? It's not just Ray-Ban Meta.
Kenan: I am very much looking forward to the future of different product forms and scenarios. For example, in the field of education, or in an environment like a museum, I'm particularly interested in how to create more interactive scenarios where people can learn and create, whether it's through robots or educational toys. These concepts are fascinating, such as smart toys that tell stories, or educational devices that enhance the learning experience.
The development of general-purpose equipment is indeed full of challenges, and we need to find the most suitable product form. But I believe there will be many attempts in the future, both highly personalized products and perhaps ambient devices. For example, in a public space, such as a museum, how do you design a tour guide device that can interact with you while still maintaining a private experience? This type of scenario has great potential.
I am very much looking forward to seeing how these different forms of devices are implemented in specific scenarios, such as education, entertainment and travel. I believe that these innovations will completely change people’s lives and production methods, and I am full of expectations for the future.