Music Is Too Loud: How to Fix Your Video Audio Mix

May 21, 2026

The music is too loud in video is one of the main concerns of our clients. Background music drowns out the narrator because the most common cause is a flat mix: the music and dialogue sit at about the same volume levels throughout the video, with little to no automation or ducking.

On a pair of studio monitors, it may seem like it’s fine. But on a phone speaker or laptop audio, where most viewers watch their videos, it breaks apart.

Both the human voice and background music occupy much of the same frequency ranges, primarily in the mid-range area(between 500 hz and 4000 hz). Therefore, when both are pushed to similar volumes, the music occupies the exact same frequency space the voice needs to cut through. The narrator doesn’t get louder; he gets muddled.

What the levels between narrator and background music should be

Editors typically just eyeball this. As a result, there are inconsistent levels across scenes, and they’re usually wrong. These are the correct targets to achieve a clean professional mix:

Target levels for narrator: -12 db to -6 db peak. This keeps the narrator loud, clear, and prominent, without him getting close to maxing out.

Target loudness units (LUFS): around -16 LUFS for online video platforms.

Background music levels when narration is present: -18 db to -25 db. That’s far lower than most editors use by default. The music should be noticeable more than heard when the narrator is speaking.

Background music levels when there isn’t any narration: -12 db to -14 db. Intro/outro transitions and b-roll sections (where no One is speaking), allow the music to “breathe” at a higher volume.

Loudness normalization targets: -14 LUFS for YouTube, -16 LUFS for virtually all other online platforms. Whenexporting your final mix, the combined total of both audio tracks should hit these targets consistently.

What matters most: voice-to-music ratio: Effective audio balancing considers dialogue as the main signal and background music as support. If you can clearly hearand follow the music while the narrator is talking, it’s way too loud.

Standard operating procedures: balancing narrator & background music

How to effectively balance narrator and background music in an Editor:

Work from a copy: before you ever touch any levels, duplicate the sequence or backup your project file. Never made destructive changes to the original project file.
Listen to dialogue track alone first: solo the dialogue track, then listen to only the narrator. Look for irregular volume, pops, room noise, etc. That needs correcting prior to adjusting the music levels. Make corrections on the dialogue track first before making adjustments anywhere else.
Adjust dialogue levels: set narrator levels at -12 db to -6 db peak. Also, lightly compress (3:1 ratio/fast attack) to create consistent volume spikes without killing the natural flow of the narrator’s voice.
Automate music volume levels: don’t simply set a single static level for your music track. Instead, use volume envelopes or keyframes to decrease music levels when narration begins and increase them again when there is no narration. Decrease should occur over approximately 0.5 – 1 second. Not instantaneously. This will help make it sound normal.
Apply EQ to music track: take away some of those mid-range frequencies on your music track. Especially between 800 hz– 3000 hz. This will carve out some space for your narrator without removing music from the overall mix. Typically, a 3 db – 6 db reduction in these areas is sufficient.
Final music level settings: after applying EQ and creating ducking for your music levels, set the background music at -18 db to -25 db when there is narration occurring. Compare this with your dialog track. If you can follow along with the lyrics or melody of the song while listening to the narrator speak, reduce the levels even further.
Test on multiple devices: mix a preliminary version of your video and test it on different audio systems, including headphones, phone speakers, and computer speakers, prior to locking in. A good mix sounds great on studio monitors, but usually terrible on consumer grade equipment. Testing here solves almost all problems that sneak past.
Document all used settings: write down all used music levels, compression rates, EQ levels, etc. This is your template for future videos in this series.

Simple quick fixes within your Editor

To fix a completed video that has had its audio ruined:

Auto ducking: virtually every major NLE software package (DaVinci Resolve/Final Cut/premiere pro) includes auto ducking capabilities built in. Simply run auto ducking on your music track and check how it performs. Auto ducking fixes 90% of all problems in less than 2 minutes.
Lower Clip gain: gradually lower (in 1 db increments) the Clip gain of your music tracks until your narrator is clearly understandable. Do not adjust the fader yet. Lowering the Clip gain first, and then fine tuning later provides better results.
Fade-in/fade-out around dialog segments: fade-in/fade-out your music clips around sections of dialogue. Short 0.5-second fade downs before narration occurs and fade-ups afterwards provide natural sounding volume shifts that avoid harsh volume leaps.
Normalize dialog track: Many NLE packages include a Normalize function that raises your track to a specific level of loudness. Normalize your dialog track to smooth out volume variations prior to fine tuning against your music tracks.

When project files have been lost and no access exists to original tracks

The previous steps assume you still have access to individual audio tracks in your project file. However, if a client sends you a completed video with baked music into the narration tracks or if your original project files were lost/deleted, you won’t have access to them anymore. You’ll have a flat mix and need to repair it without having access to original stems.

AI-based tools exist specifically to address such scenarios:

Adobe Podcast Enhance Speech

Adobe podcast enhance speech uses artificial intelligence (AI) to isolate the speech from mixed audio files. Simply upload a video or audio file containing a mixed audio track and adobe podcast enhance speech separates out the spoken words, reduces background noise and delivers an improved vocal-only track. While it doesn’t do perfect job of isolating music from speech in a previously mixed-down file, it does dramatically improve speech clarity when excessive background music overwhelms it.

Recommended for: flat mixes that contain some degree of spoken language that was overwhelmed by background music. Free service allows unlimited processing in exchange for limited daily time allocations (i.e., up to 60 minutes/day).

LALAL.AI (audio enhancer)

LALAL.AI is designed exclusively for audio stem separation. Once you’ve uploaded a file containing a mixed-downaudio/video track, lalal.ai generates separate tracks consisting of vocal elements and instrumentation via AI. Essentially, lalal.ai serves as an alternative method for extracting the original audio elements from a previously baked-down file.

As reported by unite.ai LALAL.AI’s algorithms cancel unwanted background noise while generating isolated tracks that clearly separate the human voice from instrumentals. Premium pricing begins at $15/month. One-time credit purchases are available to customers who don’t require ongoing subscriptions.

Auphonic

Auphonic operates differently. Instead of attempting to isolate individual audio enhancer tracks auphonic seeks to automatically balance relative volumes between spoken elements and music in a given mixed-down file. Importantly auphonic allows users to specify maximum allowed volume levels for each element type (spoken vs. Non-spoken), thus allowing users to selectively mute or suppress entire classes of musical elements as needed. Users may choose to disable or attenuate portions of their background music based upon user-defined criteria in order to better enable spoken content visibility.

Free usage tiers provide up to 2 hours of processing monthly. Pricing scales upward from approximately $11/month as demand increases. For organizations routinely requiring correction services across numerous video productions auphonic offers automated batch processing and seamless system integrations that facilitate effective long-term workflows beyond mere One-time fixes.

Descript Studio Sound

Descript Studio Sound uses AI-powered technology to enhance existing recorded media by eliminating background noise, equalizing volume levels among spoken elements, and enhancing vocal clarity generally. Although studio sound bydescript is not capable of separating individual audio enhancer tracks from mixed-down files, it does offer reasonably effective solutions for cases where the underlying spoken content is marginally buried beneath excessively loud background music.

All three AI-based tools provided earlier perform best when spoken language exists somewhere in the mixed-down file, regardless of how hard it may be to identify or interpret. Where spoken language is very faintly audible beneath heavily dominant background music, none of these AI-based tools will produce useful results.

How to remove background noise and adjust music too loud in videos

This is a problem of production. At each stage (pre-production, filming, post) there are things you can do to avoid creating video exports with poorly mixed background music.

In post-production:

Set levels for the dialogue first.
Automate the music track so it fades up when the narrator speaks and fades down when he/she isn’t talking.
Carve out frequencies with equalization so the music doesn’t overwhelm the narration.
Test everything on real devices; phones, tablets, smart TVs, etc., before exporting.

After exporting, using technology to recover your project files:

In many cases, once you’ve exported your video and lost access to your project file, it’s difficult if not impossible to recover. But not always. There are three types of artificial intelligence (AI) technology available today that can help you get some or all of your dialogues cleaned up:

Lalal.AI, this software uses speech recognition to clean up your dialogues and remove background noise.

Adobe Podcast, this tool provides automatic voice cleaning. It can remove hiss, hum, and other unwanted sounds while making sure that your voices stay intact.

Auphonic, this tool cleans up your audio using two separate algorithms. First, it removes background noise and hum. Then, it boosts the voices relative to the background. The result is much cleaner-sounding audio enhancer than you would be able to achieve manually.

There are also several free apps that provide similar functionality, including Audacity, GarageBand, and Hindenburg Field Recorder. These apps allow you to record new tracks over top of existing ones to replace background noises and bad lines.

Handing the mix over to someone else:

For those of you dealing with issues of getting consistent audio mixes across different episodes, channels, or clients’ ongoing projects, Vidpros offers a solution. Our fractional video editing services include handling audio enhancer mixing, ducking and cleaning up dialogue as part of our standard edit process. We offer same day or next-day turnaround. Check out our $100 Trail.

Capping Off

You can always solve a bad volume problem when the background music is too loud in a video, as long as it’s done early enough. In the edit room there’s an obvious process for solving problems.

First, document how you’ve set your dialogue levels. Then make sure you’re automating the music.
Next, “carve” (cut) the music using an equalizer (EQ).
Lastly, test your finished product on a device that actually has speakers or headphones before exporting.

If you’ve lost your project file and all you have left is a rendered copy of what you exported from the editor then you have some other possible fixes such as; LALAL.AI, Adobe Podcast, and Auphonic. These apps use artificial intelligence (AI) to help you recover usable versions of what was lost without having to go back to starting over.

Preventing is better than curing. Use a standard operating procedure (SOP) each time you create an edit and save all of your project files. If you do these things correctly, you’ll end up with a final product that is perfect, and the viewers won’t even realize the work went into creating it.

About the Author

Find This Helpful?

Join the Vidpros community! Subscribe to our newsletter for cutting-edge strategies, expert social media insights, and exclusive offers to elevate your video production and marketing skills—delivered straight to your inbox.

*By submitting, you agree to receive emails from Vidpros and to our privacy policy.

Three Point Lighting: How to Set It Up and Why It Works

How to Undo “Don’t Recommend Channel” on YouTube (Desktop + Mobile)

Video Sales Letter Examples That Worked (and Why They Converted)

GA4 YouTube video tracking: Why unmute tracking beats views and clicks

Interview Camera Angles That Make $500 Gear Look Like $10K

How URL Shorteners Can Boost Your YouTube (and What Not to Do)

Rule of Thirds for Beginners: Make Every Photo and Video Count

How to Save YouTube Thumbnails, Edit Them, and Change Them Later

YouTube Thumbnail Resolution: Specs, Best Editors, and Design Tips

Stay Inspired

Get in on the insider's loop with Vidpros! Sign up for our newsletter to snag exclusive insights, top-tier video marketing tactics, and special perks reserved for our community members.

By connecting with Vidpros, you’re opting into a stream of inspiration and our privacy policy.

Connect with VidPros

Curious about Vidpros for your video editing needs?

Subscribe to our newsletter

Subscribe

to our News Letter