Meta's Audiobox is an AI audio generation tool that allows users to create custom voices, sound effects, and audio stories using simple text prompts. It can generate speech in various environments and styles, non-speech sound effects, and soundscapes. Audiobox is the successor to Meta's Voicebox. It advances generative AI for audio even further by unifying generation and editing capabilities for speech, sound effects, and soundscapes, with a variety of input mechanisms to maximize controllability for each use case.

Key features of Audiobox:
- Dual Input: Accepts both voice recordings and natural language text as inputs, granting more granular control over the generated audio.
- Versatile Audio Generation: Generates custom sounds, speech, and soundscapes needed for podcasts, videos, games, and more.
- Natural Language Prompts: Allows users to describe a sound or type of speech they want to generate using natural language prompts. For example, to generate a soundscape, a user can give the model a text prompt like, "A running river and birds chirping," or to generate a voice, a user might input, "A young woman speaks with a high pitch and fast pace".
- Voice Cloning: Audiobox can learn from audio input and clone voices7.
- Audio Infilling: With infilling, users can also use the model to polish sound effects (adding different thunder sounds into a raining soundscape, for example)