As part of our pre-sprint planning, Mike Mcdonald put together an excellent synopsis of usability best practices with recommendations that we'll be incorporating into our product development roadmap.
Usability Testing is a toolset used to determine those areas where you can improve the ease of interaction with your system. Testing should be targeted at answering specific questions about your system. The granularity of testing performed seems to be a function of how difficult testing subjects are to obtain. Test subjects should come from a broad swath of demographics to insure that your system is inclusive to your users. Recruiting test subjects is a major effort, but could be critical to fine-tuning usability concerns.
This document hopes to highlight the components of usability testing, inform a discussion on the subject, and help teams decide on what level of commitment they think best suits their project's budget. At the end, there is a discussion of recommendations regarding components for the specific environment of working with the City of Portland.
Components of Usability Testing
The components of usability testing pass you through a workflow of defining, planning, preparing, executing, and reporting on usability testing.
Defining the problem space determines the level of detail for the test plan, which in turn affects the test execution. The problem space can affect the results reported for the usability test. It also directly affects the number of usability tests you may need to perform.
The problem space should cover a question that can be answered. Choosing a problem space that is too broad risks not answering a question specific enough to tie actions and changes to. The goal of gathering results is to learn what changes you can make to your system to change problematic results from previous usability tests. It is best to iterate and have usability testing as a low-weight part of your design process if possible.
A test plan lays out what test will be performed, and the dimensions of the results you are going to gather. It should state the metrics that your test will assess, and it should lay out details of the test to aid in organizing the environment and recruitment for the test.
There are two types of metrics that can and should be measured by a test: qualitative and quantitiative.
The test plan should list the demographic information for the test subject. In addition list subjective metrics that are self-reported participant ratings for satisfaction, ease of use, ease of finding information, etc. where participants rate the measure on a scale. Also include any recommendations that the participant can offer.
Quantitiative metrics are used to give easy to understand measurements. They are a useful component for determining success of a design. Better quantitative results show concretely that errors are avoided, and should be compared with the qualitative measurements. An assumption that can be tested is whether fewer errors and shorter interaction time means more satisfaction.
Some examples of quantitative metrics from usability.gov:
- Successful Task Completion: Each scenario requires the participant to obtain specific data that would be used in a typical task. The scenario is successfully completed when the participant indicates they have found the answer or completed the task goal. In some cases, you may want give participants multiple-choice questions. Remember to include the questions and answers in the test plan and provide them to note-takers and observers.
- Critical Errors: Critical errors are deviations at completion from the targets of the scenario. For example, reporting the wrong data value due to the participant’s workflow. Essentially the participant will not be able to finish the task. Participant may or may not be aware that the task goal is incorrect or incomplete.
- Non-Critical Errors: Non-critical errors are errors that are recovered by the participant and do not result in the participant’s ability to successfully complete the task. These errors result in the task being completed less efficiently. For example, exploratory behaviors such as opening the wrong navigation menu item or using a control incorrectly are non-critical errors.
- Error-Free Rate: Error-free rate is the percentage of test participants who complete the task without any errors (critical or non-critical errors).
- Time On Task: The amount of time it takes the participant to complete the task.
- Purpose: Identify the concerns, questions, and goals for this test. These can be quite broad; for example, "Can users navigate to important information from the prototype's home page?" They can be quite specific; for example, "Will users easily find the search box in its present location?" In each round of testing, you will probably have several general and several specific concerns to focus on. Your concerns should drive the scenarios you choose for the usability test.
- Schedule & Location: Indicate when and where you will do the test. If you have the schedule set, you may want to be specific about how many sessions you will hold in a day and exactly what times the sessions will be.
- Sessions: You will want to describe the sessions, the length of the sessions (typically one hour to 90 minutes). When scheduling participants, remember to leave time, usually 30 minutes, between session to reset the environment, to briefly review the session with observer(s) and to allow a cushion for sessions that might end a little late or participants who might arrive a little late
- Equipment: Indicate the type of equipment you will be using in the test; desktop, laptop, mobile/Smartphone. If pertinent, include information about the monitor size and resolution, operating system, browser etc. Also indicate if you are planning on recording or audio taping the test sessions or using any special usability testing and/or accessibility tools.
- Participants: Indicate the number and types of participants to be tested you will be recruiting. Describe how these participants were or will be recruited and consider including the screener as part of the appendix.
- Scenarios: Indicate the number and types of tasks included in testing. Typically, for a 60 min. test, you should end up with approximately 10 (+/-2) scenarios for desktop or laptop testing and 8 (+/- 2) scenarios for a mobile/smartphone test. You may want to include more in the test plan so the team can choose the appropriate tasks.
- Roles: Include a list of the staff who will participate in the usability testing and what role each will play. The usability specialist should be the facilitator of the sessions. The usability team may also provide the primary note-taker. Other team members should be expected to participate as observers and, perhaps, as note-takers.
Test Subject Recruitment
Test subjects should come form a representative portion of the user base. Compiling a test subject base and gathering test subjects is a major effort in terms of time and money. It can be offloaded to a specialized firm that can handle testing and recruiting.
One key to gathering subjects is to work down the idea that a product's users are "the general public," and instead focus on creating a product that is usable for targeted demographics which provide a representational set of users. It is important to create demographic categories that allow you to recruit a variety of users that represent differences.
Number of Subjects
Nielsen suggests that the optimal number of test subjects is almost always 5, and focusing on qualitative metrics, rather than quantitative metrics. However, for more complicated tests that number can grow quite high:
- Quantitative studies (aiming at statistics, not insights): Test at least 20 users to get statistically significant numbers; tight confidence intervals require even more users.
- Card sorting: Test at least 15 users per user group.
- Eyetracking: Test 39 users if you want stable heatmaps.
Some things to consider when compensating participants:
- If your participants are federal employees, you cannot pay them for their time.
- If your participants are non-federal employees, the mode of compensation should be in-demand by potential participants. Money in any form is generally acceptable but it may be more convenient to provide gift cards or certificates for online or local vendors. Keep in mind that purchases from an online vendor will generally charge your participants for shipping. You may want to adjust the compensation accordingly.
- If you are doing remote testing, you may want to consider an electronic mode of compensation such as an eCertificate to an online vendor.
- Remember to provide a receipt for your participants (one for adults and one for minors are included in the templates section) to sign for the purposes of:
- Showing that they received the compensation and
- Providing documentation to your accounting department or personnel.
The testing space should serve the function of providing an environment that subjects can focus in, and where you ar efree to interact with the test subject as your methods require.
- Concurrent Think Aloud (CTA) is used to understand participants’ thoughts as they interact with a product by having them think aloud while they work. The goal is to encourage participants to keep a running stream of consciousness as they work.
- In Retrospective Think Aloud (RTA), the moderator asks participants to retrace their steps when the session is complete. Often participants watch a video replay of their actions, which may or may not contain eye-gaze patterns.
- Concurrent Probing (CP) requires that as participants work on tasks—when they say something interesting or do something unique, the researcher asks follow-up questions.
- Retrospective Probing (RP) requires waiting until the session is complete and then asking questions about the participant’s thoughts and actions. Researchers often use RP in conjunction with other methods—as the participant makes comments or actions, the researcher takes notes and follows up with additional questions at the end of the session.
Things to consider when you are deciding on which technique to employ include:
- Can the participant work completely alone?
- Will you need time on task and accuracy data?
- Are the tasks multi layered and require concentration on the part of the participant?
- Will you be conducting eye tracking (though not covered here, see Romano Bergstrom & Olmsted Hawala 2012, for the effects of think aloud on eye-tracking data)?
Look for patterns in your notes about how the test subjects interacted with your prototype. Record data in a spreadsheet to help you do any calculations you need to do.
Level of Severity
For any problems you identify, rank their level of severity as one of: Critical, serious, or minor.
From usability.gov, include the following in your test result document:
- Background Summary: Include a brief summary including what you tested (website or web application), where and when the test was held, equipment information, what you did during the test (include all testing materials as an appendix), the testing team, and a brief description of the problems encountered as well as what worked well.
- Methodology: Include the test methodology so that others can recreate the test. Explain how you conducted the test by describing the test sessions, the type of interface tested, metrics collected, and an overview of task scenarios. Describe the participants and provide summary tables of the background/demographic questionnaire responses (e.g., age, professions, internet usage, site visited, etc.). Provide brief summaries of the demographic data, but do not include the full names of the participants
- Test Results: Include an analysis of what the facilitator and data loggers recorded. Describe the tasks that had the highest and lowest completion rates. Provide a summary of the successful task completion rates by participant, task, and average success rate by task and show the data in a table. Follow the same model for all metrics. Depending on the metrics you collected you may want to show the:
- Number and percent of participants who completed each scenario, and all scenarios (a bar chart often works well for this)
- Average time taken to complete each scenario for those who completed the scenario
- Satisfaction results
- Participant comments can be included if they are illustrative.
- Findings and Recommendations: List your findings and recommendations using all your data (quantitative and qualitative, notes and spreadsheets). Each finding should have a basis in data—in what you actually saw and heard. You may want to have just one overall list of findings and recommendations or you may want to have findings and recommendations scenario by scenario, or you may want to have both a list of major findings and recommendations that cut across scenarios as well as a scenario-by-scenario report. Keep in mind:
- Although most usability test reports focus on problems, it is also useful to report positive findings. What is working well must be maintained through further development.
- An entirely negative report can be disheartening; it helps the team to know when there is a lot about the Web site that is going well.
- Each finding should include as specific a statement of the situation as possible.
- Each finding (or group of related findings) should include recommendations on what to do.
- Card sorting to understand how bureaus are perceived by members of the public.
- Simple qualitative tests that are more survey like than quantitative to cut overhead
Test Subject Recruitement
- Use the diversity committees to identify participants.
- Have recruits self-identify aptitude with computers and websites.
- Use a low cost way of compensating subjects for their time such as baked goods so as to build good will
- Standardize the logistics as much as possible to cut down on planning and preparation.
- Make testing remotely a possibility. Perhaps Skype screen sharing could be utilized.
- Test with prototypes as much as possible to determine key tenents of the design and begin to develop a research-based approach to design.