Whisper - Personalized Visual and Auditory Environments

Project Members:

George Wang, Jingying Deng

🎨 Project Brief:

Research indicates that auditory and visual stimuli can significantly influence learning efficiency, emotional states, and cognitive processes (Akhshabi & Dortaj, 2023; Angwin et al., 2017; Gallego-Gómez et al., 2020; Jakubowski & Ghosh, 2019; Söderlund et al., 2021). Auditory elements, such as background music and white noise, have been found to reduce stress, enhance memory retention, and improve focus. Similarly, visual content tailored to individual preferences can boost engagement and motivation in learning settings (Harris & McDonald, 2016). However, current tools do not integrate both modalities into a personalized learning experience.

To address this gap, our project develops an AI-powered tool that enables users to generate personalized music and images tailored to their learning preferences. The system utilizes AI-generated images as desktop wallpapers and synchronizes them with background music, creating an immersive and optimized learning environment. The goal is to enhance focus, regulate emotions, and improve cognitive efficiency during study or work sessions.

⚠️Rationale for AI Assistance

AI assistance is critical in this context for several reasons. First, AI enables personalization, allowing both visual and auditory elements to be tailored to the user’s specific preferences. This ensures a more engaging and effective learning experience, as users can interact with an environment that resonates with their individual needs and preferences. Second, AI enhances efficiency by quickly generating personalized visuals and audio, saving users time and providing them with a flexible learning space that can adapt to their needs without requiring manual intervention. Lastly, AI supports adaptive learning, as the system can continuously evolve based on user interactions and feedback, improving the personalization of the auditory-visual environment and the user’s learning experience.

👉Need Finding

Through secondary research and user studies, we identified the following core needs:

Improved Focus and Concentration: Many learners struggle with distractions, making it difficult to maintain focus for extended periods. Personalized auditory-visual environments can filter distractions and promote deeper concentration.
Emotional Regulation: Anxiety, stress, and fatigue negatively impact cognitive performance. Music and visual stimuli can help regulate emotions, providing a calming yet engaging atmosphere.
Personalized Learning Environments: Individuals have unique preferences for background noise and visual aesthetics. AI-driven customization ensures each user receives an optimized learning experience.
Cognitive Load Management: Reducing extraneous cognitive load by balancing visual and auditory elements allows learners to absorb and retain information more effectively.

Target Audience:

Higher education students who require an enhanced focus environment for studying or academic tasks.
Professionals seek a virtual space to increase productivity and maintain focus.

Modalities

✨Visual

AI-generated personalized images based on user preferences are set as the desktop wallpaper. The style of image is decided by users so that they can choose their preferred style, such as realistic, abstract art, etc. These images will be used to enhance emotional engagement and focus.

✨Auditory

AI-generated background music (BGM) and white noise tailored to the generated image. Users can select from various music types, including white noise, ambient music, or combined sounds, creating a personalized auditory environment.

✨Interactive

Text input to generate visual content and the ability to control music preferences (type, duration, intensity) that align with the generated visual.

Secondary Research

This project builds upon prior research in cognitive science, artificial intelligence, and multimedia learning. Key studies that inform our approach include:

Akhshabi & Dortaj (2023) - Investigates the impact of music therapy on stress reduction and memory improvement.
Angwin et al. (2017) - Explores how white noise enhances new-word learning.
Gallego-Gómez et al. (2020) - Demonstrates the effectiveness of music therapy in improving auditory memory and sensitivity.
Harris & McDonald (2016) - Highlights the role of personalized visual content in improving engagement and retention.
Jakubowski & Ghosh (2019) - Examines autobiographical memory retrieval triggered by music.
Söderlund et al. (2021) - Shows how sensory white noise improves reading and memory recall in children with reading disabilities.

🎥Prototype

➡️Click image to watch

➡️ Click here to try the prototype

📝 Research and Methodology

Key terms identification

Positive emotion: relaxed, focused, and satisfied

Measure: facial expression, body language, physiological survey, and interview

Positive emotion: Learning outcome: focus and memory retention

Measure: facial expression, body language, and quiz survey

Procedure

Research Room Layout

Research Questions

How does art style (realistic vs. abstract) in AI-generated wallpapers influence learners’ emotional states (relaxation, focus, satisfaction) and learning efficiency (focus, memory retention)?
How does wallpaper type (static images vs. animations) impact learners’ emotional regulation (relaxation, satisfaction) and their focus during learning tasks?
How does the freshness of AI-generated music (novel vs. familiar tracks) affect learners’ cognitive load, focus, and emotional states during study sessions?
How do different types of music (white noise, music, white noise + music) impact learners’ focus, relaxation, and learning efficiency in AI-generated study environments?
How does the combination of personalized visuals (wallpapers) and AI-generated audio impact learners’ overall learning experience, emotional state, and cognitive performance compared to using either element alone?

Methodology

Variable	Research Question	Data Collection	Data Analysis	Coding Scheme
Art Style (Realistic/Abstract)	RQ1	FER, Eye-tracking, Survey, Interview	t-tests, ANOVA, Eye-tracking metrics	Emotional states, Engagement levels
Wallpaper Type (Static/Animation)	RQ2	FER, Eye-tracking, Survey, Interview	t-tests, Eye-tracking metrics	Distraction, Emotional response
Freshness of Music (Novel/Familiar)	RQ3	FER, Eye-tracking, Cognitive Load Survey, Quiz, Interview	ANOVA, Cognitive Load Analysis	Cognitive load, Focus, Perception of novelty
Music Type (White noise/Music/Both)	RQ4	FER, Eye-tracking, Survey, Quiz, Interview	t-tests, Quiz Score Comparison	Audio preference, Emotional state, Focus
Overall Experiences (Visual/Audio/Both)	RQ5	FER, Eye-tracking, Survey, Quiz, Interview	Mixed ANOVA, FER & Eye-tracking Correlation	Learning experience, Emotional regulation

🛡️ Ethical and Societal Considerations

Ethical Implications

Whisper is built with a strong focus on ethical responsibility, especially when using AI in learning environments. A significant concern is user privacy, especially when it comes to collecting personal information, such as emotional states, preferences, or behavior. To protect users, Whisper uses a transparent and consent-based approach. Before using the system, users are introduced to our “Generative AI Terms of Use”, which explains how the AI works and how data is handled. No data is collected without permission. If users choose, their inputs can be stored locally to reduce privacy risks. Whisper’s AI tools are always user-initiated, meaning the system does not make automatic decisions or change things without the user’s choice. This design ensures that learners remain in control of their environment.

Whisper avoids emotional manipulation by not using hidden mood detection or behavior tracking. Instead, it offers tools to support emotional regulation. By giving users control over when and how they use AI features, Whisper respects human agency. The goal is not to replace a learner’s decisions, but to help them make better ones with less stress. This approach helps reduce possible harms from AI and builds trust between the user and the system.

Societal Implications

Whisper supports learners who often struggle with focus and motivation, especially those who study alone or in noisy, uncomfortable spaces. Many students try to improve their study environment by searching for the right ASMR videos, white noise playlists, or relaxing images online. Whisper reduces this burden by using AI to generate personalized music and images based on users’ preferences. It helps learners get into a calm and focused state without wasting time searching. For learners who are neurodivergent, easily distracted, or have limited access to ideal study spaces, this can make a big difference.

Whisper also helps reduce digital inequality. It is designed to be lightweight and accessible, even for those with limited resources. By providing personalized study support at no cost, Whisper creates a more inclusive and supportive learning environment. It shows how AI, when used responsibly, can make education more fair and effective for everyone.

References

Akhshabi, M., & Dortaj, F. (2023). Effectiveness of music therapy on sensitivity, memory, and auditory sequence of 7 to 9-year-old girls with reading disorder. Journal of Iranian Medical Council.
https://doi.org/10.18502/jimc.v6i3.12849

Angwin, A. J., Wilson, W. J., Arnott, W. L., Signorini, A., Barry, R. J., & Copland, D. A. (2017). White noise enhances new-word learning in healthy adults. Scientific Reports, 7(1).
https://doi.org/10.1038/s41598-017-13383-3

Gallego-Gómez, J. I., Balanza, S., Leal-Llopis, J., García-Méndez, J. A., Oliva-Pérez, J., Doménech-Tortosa, J., Gómez-Gallego, M., Simonelli-Muñoz, A. J., & Rivera-Caravaca, J. M. (2020). Effectiveness of music therapy and progressive muscle relaxation in reducing stress before exams and improving academic performance in nursing students: A randomized trial. Nurse Education Today, 84, 104217.
https://doi.org/10.1016/j.nedt.2019.104217

Harris, J., & McDonald, J. (2016). The role of visual imagery in education and learning. Journal of Cognitive Education and Psychology, 15(2), 172-187.

Jakubowski, K., & Ghosh, A. (2019). Music-evoked autobiographical memories in Everyday Life. Psychology of Music, 49(3), 649–666.
https://doi.org/10.1177/0305735619888803

Kunda, M. (2018). Visual mental imagery: A view from Artificial Intelligence. Cortex, 105, 155–172.
https://doi.org/10.1016/j.cortex.2018.01.022

Söderlund, G. B., Åsberg Johnels, J., Rothén, B., Torstensson‐Hultberg, E., Magnusson, A., & Fälth, L. (2021). Sensory white noise improves reading skills and memory recall in children with reading disability. Brain and Behavior, 11(7).
https://doi.org/10.1002/brb3.2114