Removing Watermarks and Adding Captions: A Practical Guide for Brand Teams

Start With the Legal Line

Removing a watermark from a photo you have not licensed is copyright infringement, full stop. Removing one from a photo you have licensed (because the file you got from the marketplace still has the comp watermark on it, which happens) is housekeeping — and most marketplaces will give you the clean file on request.

Before any team member touches a watermark-removal tool, the question to answer is: do we have a license document for this image? If the answer is no, the tool isn't the issue. The license is.

The Three Removal Methods That Actually Work

For images you do own and just need cleaned up:

Generative fill (Photoshop, since v24). The cleanest result for complex backgrounds. Select the watermark, hit generative fill with no prompt, take the version that matches the surrounding texture. ~10 seconds per image.
Content-aware healing (Photoshop, Affinity, GIMP). Older but still excellent for watermarks on flat or smooth backgrounds. Free if you're on GIMP or Photopea.
Re-licensing the clean asset. Often forgotten. If the marketplace shipped you the comp version by mistake, request the clean file from your account dashboard. Free, instant, and produces a perfect result with no edit history.

What we don't recommend: AI-only one-click "watermark remover" web tools. The results look fine until you put them on a billboard and discover the regenerated patch is slightly off in color temperature.

Captions Are a Channel Decision, Not a Postscript

The biggest mistake we see brand teams make with captions is treating them as something the editor adds at the end. Captions are a channel decision — the format depends on where the video lives.

Instagram / TikTok / YouTube Shorts: burned-in (open) captions, animated, large type, high contrast. 85% of mobile feed video is watched without sound.
YouTube long-form: uploaded SRT/VTT, not burned-in. Lets viewers turn them off and lets YouTube index the transcript for search.
LinkedIn / paid social: burned-in. LinkedIn's native player handles closed captions poorly enough that we don't trust it.
Brand site / OTT: separate caption track, almost always. Lets the player respect accessibility settings.

What Good Burned-In Captions Look Like

Five practical rules from cuts we've shipped this year:

Two lines max, ~32 characters per line. Three lines is a wall of text and viewers skip it.
One sentence per card, ideally. Splitting a sentence across two cards is worse for comprehension than slightly off-time captions.
Bottom third, with a 4-8% safe-area margin. Account for platform UI overlays (Instagram's bottom action bar, TikTok's right-hand action stack).
Sans-serif, semi-bold, 60-72px equivalent at 1080p. Anything thinner reads as decorative and fails on small screens.
White type, semi-transparent rounded box behind it. Drop shadows alone fail on bright backgrounds; pure white-on-video fails on snow or sky.

The Tool Stack That Actually Saves Time

Caption workflows that scale across a content team usually look the same:

Auto-transcribe: Descript, CapCut, or Premiere's built-in — all reach 95%+ accuracy on clean audio.
Human pass for proper nouns and brand terms. Auto-transcribe gets "Calendly" wrong half the time. Five-minute manual fix per cut.
Animate in After Effects or CapCut. The kinetic-caption look (one or two words at a time, scaling/pop) does measurably better on TikTok than static captions.
Export burned-in once, export SRT once. Two delivery files per video, one workflow, covers every channel.

Accessibility Is the Real Reason

Open captions started as a sound-off-mobile workaround. They're now a baseline accessibility expectation, and in many regulated sectors (healthcare, government, education) a hard requirement under WCAG 2.2 AA.

The framing shift worth making internally: captions aren't an optimization for the no-sound viewer. They're table stakes for the deaf-and-hard-of-hearing viewer. Once a team adopts that framing, the question stops being "should we caption this?" and becomes "what's our caption SLA?" — usually 100% of public-facing video, captioned within 24 hours of publish.

What to Cut From Your Process Today

Three things we consistently see brand teams stop doing once they audit the caption pipeline:

Hand-typing captions in the NLE. Auto-transcribe + human pass is 4-6x faster.
Designing each caption card from scratch. One template per channel. Five templates total covers a brand's entire surface area.
Using YouTube's auto-generated captions as final. They're the floor, not the ceiling. Brand-name accuracy alone justifies a human pass.

Ready to put a camera on it?

Start Motion Media is a commercial production company for emerging brands — crowdfunding films, DTC product videos, and brand campaigns shipped from San Francisco, New York, Austin, Denver, and San Diego.

Get a Quote About the Studio

What's in this article

Start With the Legal Line

The Three Removal Methods That Actually Work

Captions Are a Channel Decision, Not a Postscript

What Good Burned-In Captions Look Like

The Tool Stack That Actually Saves Time

Accessibility Is the Real Reason

What to Cut From Your Process Today

Ready to put a camera on it?

Related Reading

When a Video Edit Doubles Conversion

Stock Photos and SEO, Honestly

Motion Graphics Advertising Examples

Music Selection for Brand Video