Start Using Standardized UX Metrics
Trying to gauge if a design you’ve worked on tests successfully can be stressful. What do you track, listen to, measure, and so on to see if you’ve improved? But did you know that there are battle-tested standardized usability question systems that tie directly to business value benchmarks like NPS (Net Promoter Score)?
So how can you start using standardized UX metrics in your process?
Recommended Standardized UX Metrics
Jeff Sauro & James R. Lewis in the book “Quantifying the User Experience” covers a swath of data scientist-like terms such as confidence intervals, derivations, correlations, and so on to help hardcore user experience researchers gain a very clear insight into the accuracy of reading data. It’s way over my head.
However, he does dive into some of the international standards-body recognized (ANSI 2001 and ISO 1998) usability metric collection methods out in the field today.
SUS
Software Usability Scale (or SUS) is one of these standards. It was developed in 1996 as a concrete way to measure system usability across an entire product. It has been found to be the most one of the accurate measures of usability for a product, and has pretty close ties to informing a future NPS. And while there are simpler up-and-coming alternatives that produce a similar result to SUS, such as UX Lite, SUS is tried and true in my opinion, and I’m sticking with it (even if it is eight more questions than UX Lite).
SUS by default is a ten question survey, that ranks items on a 1 – 5 scale of strongly disagree to strongly agree. SUS additionally alternates a positively framed question, then a negative one, in hopes of preventing click-happy users from generating false data.
However, it’s been found that a 7 point scale, not a 5, is a more accurate measure. Also, an all-positive set of questions seems to be less likely to cause user errors. I call this method the SUS 7+.
Finally, the average score for a SUS result is 68, but anything over 80 is considered very good. To grade the SUS 7+, total all the points & subtract 10 from the total. Then multiply the result by 1.66667 (a perfect 60 will return a SUS score of 100).
Here are the questions I use, but I did have to alter the negative to positive language, which may affect your result here.
- I think that I would like to use this product frequently.
- I found this product to be simple.
- I thought that this product was easy to use.
- I think that I could use this product without the support of a technical person.
- I found the various functions in this product were well integrated.
- I thought there was a lot of consistency in this product.
- I imagine that most people would learn to use this product very quickly.
- I found this product to be intuitive.
- I felt confident using this product.
- I could use this product without having to learn anything new.
And the user’s response are:
- Strongly disagree
- Disagree
- Mildly disagree
- Neutral
- Mildly agree
- Agree
- Strongly agree
SEQ (Single Ease Questionnaire)
For individual tasks, however, it was recommended to use SEQ or SEMQ (which I find highly confusing). The SEQ is just one question: “How difficult or easy did you find this task?” And users have a very similar seven option response, starting with 1 (very difficult) through 7 (very easy).
SUM (Single Usability Metric)
SUM calculates task completion, time, satisfaction, clicks, and errors and delivers that as a single value on usability.
While I’m intrigued by SUM, I do find it a bit more daunting. It requires much more effort to do well than the other two, which can be delivered in unmoderated scenarios.
Putting it All Together
In the past, when I did a lot more client visits, I essentially had a scenario like the following I found worked well.
Testing flow
- Introduce myself and thank the participant
- Ask to record
- Collect recording release
- Recording frees up time for observation
Establishing familiarity
- How long have you been using the product for?
- Have you used the product within the last 30 days?
- If yes, I ask the SUS 7+ first to prevent bias with upcoming tasks.
Running the test
- Share details about the task they will be partaking in and ask if there are questions
- To get benchmarks, ask a pre-SEQ: “Based on what you’ve heard, how easy or difficult do you think this task should be?”
- Remind them to think out loud, and state the task has begun.
- Begin your timer/stopwatch.
- Make note of:
- If they can’t complete the task
- The UI causing significant frustration
- The UI causing minor frustration
- Any suggestions they may have
- When the task is complete, or they give up, stop the timer and store the result
- Ask the post-SEQ: “Overall, how easy or difficult did you actually find this task?”
- Ask any follow-up questions (some good examples can be found on Intercom’s site)
- At the end of testing, reset scenario data (if needed)
Conclusion
Make your life simple and use tried-and-true standards. They are there to make your life way easier! Any questions, thoughts, or concerns you may have run into using standardized UX metrics like this in the past? Would love to hear about it!