Web Captions 2 – UX Principles for Captions

Posted on Category: Accessibility
Starry Night (ca.1926–1927) print in high resolution by Hiroaki Takahashi. Original from The Los Angeles County Museum of Art. Digitally enhanced by rawpixel.

In the previous part of my series, I explored the types of files that exist for web captions. This time, I’d like to take a look at industry-standard UX principles for captions.


It’s been a bit, but long story short, VTT files were the only file type option for web caption. Which is fine, because a tool like Subtitle Edit can help convert any other type of caption file to VTT (and even do things like automated clean up). In fact, you can see an example of a video on my about page that has the auto-generated Youtube SRT file, and has been mildly purified and converted to VTT by Subtitle Edit.

Industry patterns

First we’ll take a look at some major players in the industry, then we’ll try to suss out which things you should keep in mind when designing with and for captions. Let’s start with the most meager of the batch, the US’s FCC guidelines. Then we’ll get a little deeper with Netflix, followed by the incredibly robust BBC standards. All right, here we go!


The FCC’s guidelines are super general. Use captions for the whole broadcast, don’t cover critical info, and avoid dropping words or sounds while staying in sync with the visual.

That’s it.


Netflix recommends working with an approved partner to create subtitles. However, in addition to the general things the FCC recommends, additional guidance includes (but isn’t limited to)…

  • Keep text bottom center. Font is always white and default sans-serif.
  • No more than 42 characters per line. However, for Korean, 16 characters per line – mixed Latin characters, spaces, and punctuation only count as 1/2 character.
  • Avoid using more than one line of words in the caption at a time (there are some extensive choices if you must go to two). But never more than two lines.
  • The minimum time a caption can be shown on screen is ~850ms (or 5/6 of a second). The maximum is 7 seconds.
  • Never use pixels to set font size.
  • For speed, condense if necessary – never paraphrase or censor content. Use * after initial letter to show audio-censored content.
  • All key dialog should be subtitled. It takes precedence over background dialog.
  • Capitalize ethnicity.
  • Use the most common spelling of a phrase or word for that locale (e.g. gray versus grey).
  • Don’t use unsupported characters.
  • Use one hyphen without a space at the start of a line to indicate different speakers. Two hyphens at the end of the line indicate interruptions.
  • Ellipsis symbols inline indicate a pause in the sentence. Start a line with one to indicate a conversation is mid-flow. Also use the Unicode symbol, not three periods.
  • Use brackets to indicate additional information, like current speaker, current language, sound effect, on-screen action, etc. (e.g. [in Spanish] How are you? OR [loud explosions]).
  • Italics are reserved for narration, inner monologue, music lyrics, audio from speakers, or off-scene people.
  • Dates have many variations on handling. Just write them as stated by the audio.
  • Include the translator credit (not the translator’s company) as the last event of the subtitle file.


BBC is the most robust of the batch. And while much of what the FCC and Netflix do is included, BBC does even more! Here are some (not all) of the additional items BCC considers:

  • Subtitles must appear against a black background.
  • If the speaker is in shot, retain the start and end of their speech – these are most obvious to lip-readers who will feel cheated if words are removed.
  • Be faithful to the speaker’s style of speech (e.g. If the speaker has a strong dialect, “I’ve a cat” is better than “I have a cat”). Give indication of “ums” and “ers” if they are important for characterization or plot.
  • Use all-caps for stressing a word.
  • All music that is part of the action, or significant to the plot, must be indicated in some way. Use the real name of the song if it is well-known.
  • If on-screen graphics aren’t easily legible, include text contained within to provide contextual information.
  • Children’s subtitling can be simplified for readability.
  • 37 characters is broadcast standard per line.
  • Left, right and center justification can be useful to identify speaker position, especially in cases where there are more than three speakers on screen.
  • Recommended subtitle speed is 160 words-per-minute (~0.33 second per word). Subtitle Edit has built in support to check this.
  • Provide extra time when there are lots of on-screen speakers, unfamiliar words, lots of visuals to describe, etc.
  • Do not bring in dramatic, impactful, or comedic subtitles too early.
  • Shot and subtitle changes should occur at the same time.
  • You can color unique speakers. Colors in terms of priority are #FFF, #FF0, #0FF, #0F0. #00F is reserved for minor characters as a floating color.


Generally speaking, the core UX principles for captions seem to be:

  • Use white sans-serif on black in the bottom center. You can adjust position, but it would be an extreme exception.
  • Keep lines under 37 characters, one line if possible – two max. Unless it’s a CJK language, then it’s 16 characters per line.
  • Respect and keep pace with the content, context, and reader’s ability.
  • Use common special characters and styling appropriately, they have meaning. These are things like hyphens, brackets, ellipsis, italic, all-caps, etc.

In the next part of the series, we’ll dive into what’s even possible to style cross-browser via CSS. Stay tuned!

Want to read the rest of the series?

  1. History and Formats
  2. UX Principles for Captions
  3. Caption Styling Challenges
  4. Final Styling Recommendations
My opinions & views expressed may not reflect my employer's.