AnimeMoji: An Anime Animoji Generator

Zhiwen Qiu, Claire Zhou and Sissle Sun, Advisor: Dr. Abe Davis

Emojis, especially anime-inspired and animated ones, have grown essential in digital communication due to their expressive nature and cross-cultural applicability. However, the process of designing and creating personalized anime characters remains a challenge for many people without prior experience in art or animation. Commissioning artists to design custom anime characters can be costly and is not appropriate for those with limited budget. In addition, existing methods for creating animated anime characters often involve time-consuming processes and steep learning curves.

Recent advances in generative AI have made the creation of high-quality digital content more efficient and accessible. Advanced face mapping techniques enable the automatic mapping of human facial expressions from video inputs onto the anime head, making the animation process efficient and intuitive for users. The objective of this project is to develop animated anime emojis by employing text prompts that specify preferred head features and transferring facial expressions from human face videos onto the anime head.

System Design

AnimeMoji comprises of three interconnected networks designed to animate anime head using text prompts and video inputs. The first network, Stable Diffusion, generates idealized singles images of anime heads based on user-provided customized text input. The second removes the anime head from its background and adds an alpha channel to the image. The third network captures facial expressions from the input video and maps them onto the anime head, ultimately generating personalized animated emojis that represents users’ emotions and expressions.

Fig 1. AnimeMoji system design

Face Mapping

The face mapping process starts by detecting facial landmarks within each frame of the input video. This is achieved by first converting the video into a series of RGB images and subsequently extracting the landmarks from each frame. Once the facial landmarks have been identified, they are converted into a coordinate array that represent the pose and facial expressions in each frame. To calculate facial feature change ratio, such as eye and mouth aspect ratio, as well as head rotations across x, y, and z axes, we utilized the neutral facial position from the first frame of the input video as a reference. The information was then fed into a Talking Anime Head network specifically engineered to translate these coordinates into positions and expressions with customizable parameters of eyes, mouths, and heads.

As an illustration, creating a “wink” eye pose required retrieving the index for each eye parameter from the facial landmarks. One challenge faced was that the face mesh only had landmarks around the eyes, without indicating whether the eyes were open or closed. As a solution, we instead used Eye Aspect Ratio, which measures the ratio of average y difference to x difference of selected landmarks from the face mesh, to determine relative eye openness. We then defined values for upper and lower limits through trial and error with common expressions, mapping the ratio to the eye parameter of the Talking Anime Head project (), ranging from 0 to 1. However, replicating human eye movements on anime faces proved inaccurate due to the exaggerated movement of the latter. To smooth out the movement, we experimented with suitable coefficients as scale factors.

Fig 2. Eye aspect ratio formula

We applied different ratios for other facial features and experimented through the above-mentioned process to find the best values for controlling movement. For example, head nodding was controlled by tracking the positions of the nose tip, the top and bottom of the head, and obtaining a relative rotation from the reference point in the video’s first frame. By iterating through each frame of the video, the anime head adapts and responds in sync with the video input, resulting a seamless and dynamic face mapping experience.

Fig 3. Example frames of mapping from human face to anime face

Fig 4. Examples of anime emojis mapped from human faces

Contribution

AnimeMoji makes contributions in democratizing the creation of personalized anime characters and animojis. It lowers the barriers for those without drawing or design skills, fostering creativity and artistic skills with users from diverse backgrounds. It can be applied across social media, instant messaging (IM) apps, or influencers to brand themselves by making customized animojis on live streams. Furthermore, it has potential to facilitate the communication in metaverse applications such as gaming and virtual reality.

Fig 5. Potential applications

Limitations and Future Work

There are several limitations in this study that require further investigations. First, the current design relies on multiple networks to execute the tasks of text prompt interpretation, background removal, and face mapping. This multi-network dependency may potentially slow down the generation process, thus impacting the effectiveness of real-time interaction with users. Second, the current model may not always accurately interpret users’ intended text prompts, needing further refinement in the translation process to provide more reliable animoji outputs. Third, the project currently restricts the range of expressiveness and animation styles as the system focuses mainly on facial features like mouth, head, and eyes. It lacks the ability to animate upper body movements or subtle emotional nuances, potentially limiting the depth of expressions and interactions.

In future research, we plan to extend the scope of the project in several ways. We aim to broaden the range of customizable options available to users, incorporating a variety of art styles and specific character traits by employing ckpt or LoRA models. Additionally, the capabilities of facial expressions and animations with the model will be enhanced. This include expanding the animation to include hand and body movements, and text overlays to provide context to the animojis. We will also integrate the project with existing communication platforms such as WhatsApp, Discord, and YouTube live streams, allowing users to share and use their personalized animojis across various digital spaces in daily digital interactions. We hope to not only improve user experience but also extend the potential use cases of the project.