Web Captions 3 – Caption Styling Challenges

Posted on Category: Accessibility, Design
Sumi-e of Cherry Blossoms: Sakura Collections from the Library of Congress

You’ve learned how the industry uses captions, now let’s take a look at browser defaults and spec compliance to see what’s even possible to style. Here we go!

What Browsers Do By Default

Web captions render very differently from OS-to-OS and browser-to-browser. I’ve tested what rendering looks like on the latest versions of everything: MacOS (Safari, Edge, Chrome, Firefox), iOS (every browser is really Safari… so Safari), Windows 11 (Edge, Chrome, Firefox), and Android (Firefox, Edge, Samsung, and Chrome).

The results of the default style? Safari looks the nicest, Firefox renders the most consistently, and Chromium browsers all do the same thing – try to match the OS’s guidance. In the end, Chromium has the feel Safari on MacOS and Firefox on Windows. One additional piece of good news is that captions respect the user-settable minimum font size.

Captions rendered on MacOS Safari
MacOS Safari caption style
Firefox-rendered captions
Firefox looks the same everywhere
Chromium-rendered captions on MacOS
Chromium browser captions on MacOS have a Safari-like feel.
Chromium browser captions on Windows
Chromium browsers on Windows however look like Firefox

Styling Caption Standards

According to the W3C spec, web captions can be styled many ways. And while it’s not commonly recommended, there are quite a few styling attributes proposed designers and developers can manipulate: Text size, text alignment, font family, position over the video, amount of space the text can cover, colors of the text and background, outline, opacity, text decoration, and so on.

Styling Within the VTT Itself

To simplify how we style, let’s first explore if we can keep all of our styling within the VTT itself, so we can keep things like speaker colors bound to the content. What I found was not encouraging.

While MDN says that styling attributes could be set within the WebVTT file itself, not a single browser tested accepted CSS styling code within the VTT file.

::cue.css {
color: green;
00:00:00.000 --> 00:00:001.000
<c.css>Part of the docs, works nowhere.</c>

Additionally, the default colors supposedly standardized to help with things like BBC’s speaker differentiation also failed everywhere. (Update: Was informed that ‘…this is not only a BBC thing. Colours are widely used in Europe to distinguish different things, not always speaker changes. Support for colours is really important in those places where the audience needs to use them to understand the content.’ Good to know!)

00:00:00.000 --> 00:00:001.000
<c.yellow.bg_blue>Part of the spec, works nowhere.</c>

And neither does inlining the color attributes into the tags themselves.

00:00:00.000 --> 00:00:001.000
<font color="#ff0080" face="Georgia">Also works nowhere.</font>

The only thing that works consistently everywhere are positioning, layout, sizing, and alignment, as well as using tags for bold, italic, and underline.

00:00:00.000 --> 00:00:001.000 position:96% line:98% align:right
<b><i><u>This works!</u></i></b>

Since we can’t do color styling within our VTT, it looks like we need to now test how external CSS affects the ::cue psuedo-class.

Using External CSS to Change Cue Styling

Using an external CSS file, the spec gives us quite a few ways to style the various parts of our VTT file. More bad news here though.

Styling regions, IDs, and tags directly work in exactly zero modern browsers. Targeting tags with a class works in all browsers but Firefox. Even using regions at all looks weird in Firefox.

/* External CSS file */
::cue-region(#reg-test) {
color: pink;

00:00:00.000 --> 00:00:01.000
Tags don't work
00:00:01.000 --> 00:00:02.000
IDs don't work
00:00:03.000 --> 00:00:04.000 region:region
Regions don't work.
00:00:04.000 --> 00:00:05.000
<c.custom>Tags with a class do, except in Firefox</c>

VTT Bugs in Every Major Browser

If those issue weren’t bad enough, there are some very serious bugs in every browser, all of which I’ve reported (Chrome, Firefox, Safari doesn’t provide public record). I’ll start with the most serious first.

  • On MacOS Safari, positioning a two-line caption to the right, without setting a “line” attribute, will allow the caption to overflow and disappear. However, if you add a “line” attribute, it will create alignment issues in other browsers. This means if you need to align captions to the left or right edges of the video and have a consistent placement experience at the moment every cue will need “line” to be set. 98% seems to look the best universally.

    Safari improperly positioning text offscreen
    Desktop Safari loses text off-screen – bottom right in this example
  • On all browsers besides Safari, text is lost or crushed if align:start isn’t set when position is low (0% ~ 25%) or when align:end isn’t set when position is high (100% ~ 75%).

    Non-Safari position size bug
    Firefox with position:2% on this cue. Chromium loses the text completely.
  • Safari on both MacOS and iOS will only choose a single track source from a shared “srclang”. In other words, if you provide an English track and an English Audio Description track both with “srclang” of “en”, you’ll only see one.

    Safari hiding options that other browsers don't
    All versions of Safari dump additional tracks with the same “srclang” attribute. As you can see, other browsers can see the “Safari won’t see this” track.
  • Safari and Chromium on MacOS won’t let you set ::cue background styles at all. But they will let you do so on individual C tags with a class, which looks out-of-place. Worse yet, if you pick a darker text color, you won’t be able to read the content, and you’ll fail accessibility.

    Safari and Chromium can have inaccessible text
    Chromium with dark text set. Both Safari and Chromium on Mac ignore the background color for the cue.
  • Chromium browsers have very different layout math than all other browsers if the size and line value have been set.

    Chromium size and position bug
    Line:98% and size:66% set on Chromium. Should be centered, but isn’t.
  • Regions are either ignored or look totally weird, especially in Firefox.

    Regions look weird on Firefox
    Caption region looking weird in Firefox. Ignored in other browsers.
  • Firefox can only set styles on ::cue universally, there is no way to differentiate between individual cues for style.
  • All browsers besides Firefox ignore the ‘inherit’ value on the ::cue psuedo-class CSS attributes for text.


If you want to style your captions, the recommendations are pretty limited to make things look consistent across all browsers.

This article about caption styling challenges was a beast to research, but one of the most fascinating things I’ve had a chance to look into this year. It also seems that there are lots of opportunities for browser improvement as well.

Want to read the rest of the series?

  1. History and Formats
  2. UX Principles for Captions
  3. Caption Styling Challenges
  4. Final Styling Recommendations
My opinions & views expressed may not reflect my employer's.