Do you ever think about the impact of the experimenter effect (or Hawthorne effect) when you’re running face to face user research?
Here’s a quick test.
First, go and check your Analytics package to see how many users check your site’s Terms and Conditions before accepting them. My guess is that the number will be roughly 1-3% (maybe lower).
Now, take a look at the notes from your last few usability research projects. How many users diligently looked at the Terms and Conditions while you were watching them over their shoulder? In my last few projects, it’s been 10-30%
So, that’s roughly 10x more in my case. Pretty substantial. This is a perfect example of how people adjust their behaviour in face to face research sessions. As soon as you pay someone to sit in a room with you, give them a task and watch them intently, they will start doing and saying what they think you want them to.
The experimenter effect is unavoidable. I’m a huge advocate of face-to-face research, but this is one of the method’s biggest weaknesses (and in equal parts, it’s one of the biggest strengths of Analytics).
What steps do you take to mitigate the experimenter effect?
Comments below please!
It’s becoming increasingly clear that usability testing is expensive and provides questionable data.
Is AB testing the future? Cheap, accurate, easy to set up… and no experimenter effect. That’s a good debate to be had right there.
As with any kind of testing, it’s up to the expert analysing the results to interpret and make sense of the data. I’ve seen a lot of Google Analytics stats being misinterpreted by novices.
Whether it’s face-to-face or recorded stats, looking at the base results without proper analysis and interpretation will lead to misunderstanding.
The expert should consolidate the findings then present the information in a sensible, understandable manner.
Quant and qual testing identify different things. I believe that we should avoid trying to look for any statistics in face to face interviews. We are subjectively looking for the narrative behind people’s behaviour, which has already been statistically identified in the quant study.
Of course there is an overlap, such as Harry’s great example of the T&Cs. In these cases we can compensate by being aware of this bias beforehand and compensating for it in our reports.
Another trick is to ask users specifically for facts when they make a clain that you believe to be without foundation. For example, when did they last read a T&C at home and what can they remember about it. I find the truth is ascertained as user ‘fess up’ to not normally doing such and such.
If I may list tips from Nick Bowmast I am sure he wouldn’t mind. These are a few techniques to mitigate users’ desire to please the moderator, by being self-depreciating:
Avoid fancy swanky â€˜designer loungeâ€™ type facilities where the participants get a choice of hot beverage waited to them.
Play down your role.
Play down any techy kit.
Carry absolutely no air of importance â€œIâ€™m just there to take notes and perhaps ask a few questionsâ€
Try not to mention design or designers.
Donâ€™t seem too interested in the outcome of the interview.
Ask them to be selfish, imagine that the product etc. should be made just for them, nobody else.
Make minimal and only neutral comments like â€œI seeâ€ â€¦ as opposed to â€œgoodâ€ when acknowledging comments.
Try to maintain a distance and position that lets you slip out of the participantâ€™s viewpoint (so they can forget you are there)
Regardless of how empathetic you are and how much rapport you build up in the icebreaker, the research effect, people aiming to please, is unavoidable.
For me the best judge of how pleasing people have been is if they change their tune once they know the session is over. You could call them ‘usability out-takes’ I suppose.
The camera has stopped rolling, they’ve got their cash, you’re walking them to the door and out pops a total clanger that you wish you’d had on film.
There are definitely ideal contexts and approaches to moderating qualitative interviews, but sometimes we have to make compromises.
E.g. running sessions at the client’s office will almost certainly make participants less likely to be honest, but may mean more stakeholders are able to attend and muster buy in where it is needed most.
Nice post Harry and a good example. I don’t think Harry’s trying to advocate the use of stats in usability testing.
Instead I think he’s gone back to over past research and used the stats as an example of how real behaviour and behaviour in usability testing can differ.
I don’t think you’ll ever totally get around this problem, regardless of how experience a test facilitator you are. It’s better to accept that all research has its drawbacks.
Jen, usability testing isn’t expensive. It’s as expensive as you make it. A/B testing has it’s limitations as well. Just as every type of research does.
In my experience usability test findings are dominated by very real and (in hindsight) obvious looking issues. Usually because things were misunderstood or went unnoticed. These aren’t the type of findings that can be explained away by the Hawthorne effect.
I think the experimenter effect is just a part of face to face research.
Whilst there are steps you can take to mitigate it (think Simon’s points above are spot on), ultimately it should be the role of the researcher to a) set up the experiments so that you are collecting data that is unlikely to be affected and b) have a comprehensive understanding of user behaviour (from stats, secondary research, previous face to face sessions etc) to know when a participant is behaving differently to the norm, and confidently understand whether this is a result of the testing environment or the design being investigated.
The trickier part is when clients are viewing the sessions, and you have to convince them that some of the behaviours they saw are not real, whilst still defending your methodology.
I think it is fatal to assume that there is no experimenter (and hence no experimenter effect) when conducting (experimental) online research – he or she is still there in the imagination of the participant. Is the website your survey is running on well designed? Does it look professional? Who is conducting the research – some undergrad as a course fullfillment (no offense!) or two professors from a well-known university (no offense, eiher ;)) – or a private company? Does the survey give the participant the impression that he or she is just a guinea pig, or a valued asset or even customer? Those are just a few examples of features which might contribute to the participant’s idea of who is conducting the study he or she is just taking part in. And the participants are (normally) not stupid – they know that the survey/experiment was created by a human…
You might also want to have a look at Ollesch, Heineken and Schulte (2006), which deals with the question of the virtual experimenter in online research (http://ijis.deusto.es/ijis1_1/ijis1_1_ollesch_pre.html, Disclaimer: I am the “Schulte” :))
It is helpful to be aware that there is an experimenter effect in function, but would you really be testing whether somebody reads the terms and conditions through a face-to-face usability test?
I am consciously misunderstanding you to make a point: when you conduct usability tests (at least when I do), I am doing testing whether or not my subjects _understand_ the interface they are operating with. Can they find their way around? Do they understand what we wanted to communicate. Is the subject looking for functionality we didn’t think of – or is he or she using the site in a way we didn’t think of?
I believe that you should always have more than one source… just like you say. Other sources could be Analytics – or even better, I think, software like userfly.com, clicktale.com, or mouseflow.com.
First of all, if I come across a user reading Terms and Conditions in an experiment setup, the chances are that I would not be considering the results derived from him/her. Because, as, obviously you pointed it out, they never read terms and conditions. One may only conclude, either this user does not belong to the user behavioral patterns I am looking for or the user must not be comfortably positioned like in his casual setting or you can figure out the reasons.
The challenge lies in designing the scenario,not just building. It’s traditional method and quite successful. You make him run and make him drink beer and make him travel train and put him in a shady room, then test it, only if, your product designed for that scenario. You can learn a great deal from movie direction or from pose photography. After all, all fields are connected.