Category Archives: advanced stuff

colour physics 101


Download my colour physics FAQ e-book for the Kindle here.

Also available as a physical book from Amazon.

  • What is colour?
  • How does colour vision work?
  • Why is the sky blue?
  • What is the colour spectrum?

The answers to these and many other related questions about colour physics are each provided in a short and easy-to-understand form. Will delight and entertain colour professionals and curious members of the public.

accurate colour on a smartphone or tablet

Electronic displays can vary in their characteristics. Although almost all are based on RGB, in fact the RGB primaries in the display can vary greatly from one manufacturer to another. Colour management is the process of making adjustments to an image so that colour fidelity will be preserved. In conventional displays – desktops and laptops – the way this is achieved is through ICC colour profiles. Colour profiles store information about the colours on a particular device that are produced by RGB values on that device. So to make a display profile you normally need to display some colours on the screen and measure the CIE XYZ values of those colours; you then have the RGB values you used and the XYZ values that resulted. The profiling software can use these corresponding RGB and XYZ values to build a colour profile so that the colour management engine knows how to adjust the RGB values of an image so that the colours are displayed properly. Building a profile often requires specialist colour measurement equipment – though this can often be quite inexpensive now. If you are using your desktop or laptop display and you have never built a profile then you are probably using the default profile that was provided when your display was shipped. The default profile will ensure some level of colour fidelity but particular settings (such as the colour temperature or the gamma) may not be adequately accounted for. If you want accurate colour then you should learn about colour profiling.

It all sounds simple except for the fact that ICC colour profiles are not supported by iOS or Android operating systems on mobile devices. I find this really surprising but that’s how it is for now. Maybe it will be different in the future.

This means that ensuring colour fidelity on a smartphone or tablet is not so straight forward. So what can you do?

Well, there are two commercial solutions to this problem that I am aware of. They are X-rite’s ColorTrue and Datacolor’s SpyderGallery. ColorTrue and SpyderGallery are apps that will use a colour profile and provide good colour fidelity. These are great solutions. Perhaps the only drawback is that the colour correction only applies to images that are viewed from within the app. Having said that, they allow your standard photo album photos to be accessed – but the correction would not apply, for example, to images viewed using your web browser. This is why a proper system implemented at the level of the operating system would be better, in my opinion.

There are two alternatives. The first would be to implement your own colour correction and modify the images offline before sending them to the device. This would not suit everyone – the average consumer who just wanted to look at their photos for example. But it is what I typically do here in the lab if I want to display some accurate colour images on a tablet. But if you were a company and you wanted to display images of some products for example – it might be a reasonable approach. It has the advantage that the colour correction will work when viewed in any app on the device because the colour correction has been applied at the image level rather than the app level. But it does mean you need to do this separately for each device and keep track of which images are paired to each device. This is ok if you have one or a small number of devices but maybe not so good if you have hundreds of devices.

The second alternative would be to build your own app. If you want to do things with your images that you cannot do in ColorTrue or SpyderGallery or if you have lots of devices and you can’t be bothered to manually convert the images for each device, then you could install your own app that implements a colour profile and then does whatever else you want it to do.

Incomplete pair comparison

One of my big academic interests is scaling perceptual phenomena. That is, we take some physical stimuli (for example, a set of sounds of varying intensity/volume) and then we want to know how loud they are perceived to be by people. This allows us to build a relationship between the physical stimulus (in this case intensity) and the perceptual stimulus (in this case loudness). The same idea could be used to scale largeness, smallness, colourfulness, whiteness, lightness, heaviness, sweetness etc. It’s not always a -ness. But it usually is.

There are a great many techniques to scale perception. You can just ask people, for example, to assign a number. For example, you play a sound and ask them to rate how loud it is on a scale, say, from 0 to 100. This is called Magnitude Estimation (ME). It’s a perfectly good technique but it has limitations and one of these is that it can be quite difficult for the participant. And, say, the first stimulus seems really loud and they assign it a loudness of 90; then it turns out that all the subsequent stimuli are louder – then all their estimations will be squeezed in the 90-100 range, which is not ideal. Consequently, in the ME technique we often have so-called anchors – that is, example stimuli at each end of the scale.

An alternative technique is called paired comparison (PC). In this we might have, for example, five stimuli A, B, C, D and E and we present them in pairs and ask the participants which one is louder (or whiter or yellower etc.). The total number of paired comparisons is 10 in this case which is quite manageable. From the results of these paired comparisons it is possible to estimate a scale value for each of the stimuli where the scale value will be an interval scale of loudness (or whiteness or yellowness, etc.). This is a really nice technique and there are quite a few papers that claim that PC is more reliable than ME, for example. However, when the number of stimuli is large the number of pair comparisons becomes huge and the the task is not practicable. When this happens it is possible to undertake so-called incomplete pair comparison where we only present some of the possible pairs to the participants. The question is, however, what proportion of the pairs should be present for the PC experiment to be reliable?

This was the question that Yuan Li and I asked each other during her doctoral research. We undertook a large-scale simulation of a PC experiment. I won’t go into the details here. The method and results have just been published in the Journal of Imaging Science and Technology (JIST). You can see the paper here.

However, I show below the key table from the research which I think might be of interest to other people who are undertaking, or planning to undertake, an incomplete PC experiment.


This table shows the number of stimuli that are being compared along the top. Down the left-hand side are the number of observers taking part. The figure in the corresponding row and column shows the per cent of pair comparisons that need to be carried out to get robust results that would be similar to those you would get if you did the full PC experiment. So, for example, if you 20 samples and 15 participants then you need to half of the possible comparisons. For 20 samples there are 190 comparisons so you would need to 95 of them (which could be selected randomly).

I should point out that there is a caveat that needs to be considered. This work is only valid if the observers can be considered to be stochastically identical. If we ask people to rate samples for loudness, or whiteness, or heaviness, for example, I think this assumption is justified. However, if we were asking people to scale how beautiful people’s face were, for example, – an experiment reminiscent of the early facebook experiment by Mark Zuckerberg – then observers could differ wildly in their judgements. One participant may rate as most beautiful a face that another participant rates as the least beautiful. Because of the assumptions that we made in our modelling we cannot predict the proportion of pair comparisons that would be needed in a case like this. We are thinking about it though.

On CIE colour-matching functions

In 1931 the CIE used colour-matching experiments by Wright and Guild to recommend the CIE Standard Observer which is a set of colour-matching functions. These are shown below for standard red, green and blue primaries. These show the amounts – known as tristimulus values – of the three primaries (RGB) that on average an observer would use to match one unit of light at each wavelength in the spectrum. Why are these so important? Because they allow the calculation of tristimulus values for any stimulus (that is, any object viewed under any light as long as we know the spectral reflectance factors of the surface and the spectral power of the light).


I gave a lecture this week about these and so they are fresh on my mind. I wanted to use this blog post to explain two things about the colour-matching functions that may be puzzling you. The first was stimulated after the lecture when one of the students came up to me with a question. You will note that for some of the shorter wavelengths the red tristimulus value is negative. Hopefully you are aware that no matter how carefully we choose the three primaries we cannot match all colours using mixtures of those three in the normal sense. What we have to do is to add one of the primaries to the thing we are trying to match and then match that with an additive mixture of the other two primaries. The question from the student was, wouldn’t that change the colour of the thing that is being matched? The answer is that it would of course. But it’s ok.

We normally represent this matching with an equation:

S ≡ R[R] + G[G] + B[B]

which simply means that the stimulus S is matched by (that is the symbol ≡) R amounts of the R primary, G amounts of the G primary, and B amounts of the B primary. The values R, G and B are the tristimulus values. I put square brackets around the primaries themselves to distinguish them from the amounts or tristimulus values of the primaries being used in the match.

Now when we add one of the primaries to the stimulus (the thing we are matching) itself, we can write this equation:

S + R[R] ≡ G[G] + B[B]

The new colour, S + R[R], can now be matched by an additive mixture of the other two. Hmmmmmm? You may ask. How does that work? Well, we can rearrange this equation to make:

S ≡ -R[R] + G[G] + B[B]

In other words, matching the additive mixture of the original stimulus S and some red with some green and blue, means that – if it were possible – we could match the original stimulus S with the same amount of green and blue and a negative amount of the red. I appreciate that this is mathematical but I hope that it is maths that anyone could understand. It’s not rocket science. Just simple adding and subtracting. This is how we arrive at the colour-matching functions above. No matter what RGB primaries we use one of them will have to be used in negative amounts to match some of the wavelengths. In practice, this is done by adding it to the stimulus as described above. Of course, you may also know that the RGB colour-matching functions were transformed to XYZ colour-matching functions. These are the XYZ values everyone is familiar with. But that is another story I will devote another post to one day.

The second question though, is isn’t this just arbitrary? If we used a different set of RGB primaries wouldn’t we get a different set of colour-matching functions? Again, the answer is yes, but again it doesn’t matter. The whole point about the CIE system was to work out when two different stimuli would match. If two stimuli are matched by using the same amounts of RGB then by definition those two stimuli must themselves match. If we used different RGB primaries the amounts of those tristimulus values would change, of course, but the matching condition would not. Two stimuli that match would also require the same RGB values as each other to match them, not matter what the primaries were (as long as they were fixed of course). So the key achievement of the CIE system was to define when two stimuli would match. However, it was also useful for colour specification or communication but that does indeed depend upon the choice of primaries and requries standardisation.

I hope people find this post useful. Post any questions or comments below.

The dangers of Likert scale data

Imagine that you want to compare two products A and B and you ask the opinions of 100 users via a survey. The table below shows a summary of the survey and the responses. The numbers under product A and product B show the number of people who gave each of the responses on the left-hand side.


This is known as a Likert scale and this post will give some thoughts on how to analyse these data.

The first thing that is worth mentioning is that there is a simple form of analysis that is relatively uncontentious. This is to say that 60% of people were very satisfied or quite satisfied with product A whereas only 45% of people were similarly very satisfied or quite satisfied with product B. On the one hand this is simple. However, can we use this analysis to say that product A is better than product B? Note one problem straight away, which is that 20% of people are very dissatisfied or quite dissatisfied with product A whereas only 15% of people were similarly very dissatisfied or quite dissatisfied with product B. It seems that product A tends to polarise opinion and it is not clear what conclusions can be drawn.

However, quite often we assign numbers to the categories (such as 5 = very satisfied, 4 = quite satisfied, 3 = neutral, 2 = quite dissatisfied, and 1 = very dissatisfied) and when this is done we can produce a number for each participant’s response; we can then average this to produce the mean values shown in the figure above. According to this we can say that on average the response to product A is 3.6 and to product B is 3.5. Can we now use these numbers to make the following two statements? (1) that product A is better than product B (since 3.6 is bigger than 3.5) and that (2) both products A and B are well received by the participants (since 3.6 and 3.5 are both bigger than 3). What I want to do in this post is discuss the validity of these statements by considering several aspects of Likert scales.

Is it valid to average the numbers?

There is a long-running dispute about whether it is valid to average the scores to produce the mean values as in the table above. To explore this we need to introduce two types of data. The first type are called ordinal data. This is the order in which things are. The Likert scale presented in the table above strictly produces ordinal or rank data. Imagine that three people, Alan, Brian and Clive run a race in which Alan wins, Brian is second, and Clive is third. Knowing the order in which they finished is fine, but it doesn’t tell us whether Alan finished well ahead of the other two or whether, for example, Alan and Brian were involved in a close finish with Clive a long way behind. If, however, we know how many seconds they took to complete the race (Alan = 40 seconds, Brian = 41 seconds, and Clive = 52 seconds) we now know much more information about the race. It turned out that Clive was a long way behind the other two. The race times, in seconds, are called interval data. With interval data the differences between the numbers are meaningful whereas with ordinal (rank) data they are not.

The problem with a Likert scale is that the scale [of very satisfied, quite satisfied, neutral, quite dissatisfied, very dissatisfied, for example] produces ordinal data. We know that very satisfied is better than quite satisfied and quite satisfied is better than neutral, but is the difference between very satisfied and quite satisfied the same as the difference between quite satisfied and neutral? Why am I worrying about this? Because when we assign numbers to the scale (the 1-5 numbers) and then average the responses we are implicitly making the assumption that the scale items are evenly spaced. We are treating the ordinal data as interval data. How can we be sure that the participants treated the scale in this way? Would it have made a difference if we had used satisfied and dissatisfied instead of quite satisfied and quite dissatisfied respectively? So it would seem that is wrong to calculate means from Likert scales. If you click here you will see a post from a PhD student (Achilleas Kostoulas) at the University of Manchester who states categorically that it is wrong to compute means from Likert scale data. I choose this example because it is simply and elegantly explained not because I necessarily agree entirely with his view. It is also worth reading the article by Elaine Allen and Christopher Seaman in Quality Progress (2007) who also take the view that Likert scale data should not be treated as interval data. Interestingly they also suggest some other techniques that don’t suffer from the ‘ordinal-data’ problem; for example, using slider bars to get a response on a continuous scale. However, before you give up detailed analyses of Likert scale data I would urge you to read the paper by Susan Jamieson called Likert scales: how to (ab)use them in Medical Education (2004: 38, 1212-1218). Although Susan is also broadly speaking against treating Likert scale data as interval data she does present the other side of the argument. In another paper, in Advances in Health Sciences Education, Norman (2010, 15 (5), 625-632) argues that the concerns about Likert scales are not serious and we should happily use means and other parametric statistics.

How much bigger do two averages need to be for an effect?

In the table at the start of this article product A and B receive scores of 3.6 and 3.5 respectively. The paragraphs above explain that calculating these means may not be valid. However, assuming that we do calculate means in this way, how different would the mean scores for product A and B need to be for us to conclude that A was better than B? I have come across students (normally in vivas) who would simply state that A is better than B because 3.6 > 3.5. To those students I then would say, would you still take that view if instead of 3.6 and 3.5 it was 3.51 and 3.5? What if it is 3.50001 and 3.5? Would they still maintain that A is better than B? It is clear that we need to consider variance and noise and carry out a proper statistical test to conclude whether 3.6 is significantly greater than 3.5. The test is called a student t-test and anyone can be taught to perform one using Microsoft Excel in a matter of minutes. In the example at the start of this article it turns out that there is no statistically significant difference. We cannot conclude that product A is received better than product B.

However, can we conclude that both products are received favourably? Again, we need a statistical test. It turns out that in this case, both 3.6 and 3.5 are statistically greater than 3 and we can at least conclude that products A and B are received favourably. However, there is the caveat that this assumes that we can treat the Likert scale data as interval data in the first place.

Other considerations

An interesting question is whether we should use 5-point scales at all. Would we get different results if we used a 7-, 9- or 11-point scale? I have found one website that suggests that a 7-point scale is better than a 5-point scale but not by much. A paper by Dawes in International Journal of Market Research (2008: 55 (1)) looked at 5-, 7- and 10-point scales and concluded that the results from a 10-point scale would be different from a 5- or 7-point scale (after suitable normalisation).

Although odd-number scales (with a neutral point) are almost always used. A paper by Garland (Marketing Bulletin, 1991: 2, 66-70) suggest that using a four-point scale (and removing the neutral point) might remove the social desirabiity bias that comes from respondents wanting to please the interviewer. I am not sure what current thinking is on this matter though and I would normally use odd-number scales.

I am not providing any definitive views on these points but rather raising awareness of issues. If you want to use a Likert scale then these are issues you need to familiarise yourself with.

My view

I will confess to having treated Likert scale data as interval data and carrying out parametric statistics (these are statistics that use statistical parameters such as standard deviations). However, deep down I know it is wrong. I am coming to the view that the best thing is not to use a Likert scale at all. I think people often use this sort of scale because it seems simple. There are ways to statistically analyse data like these and I would refer readers to categorical judgement which is a well-used psychophysical technique. My colleague Ronnier Luo at Leeds University has used this technique extensively for decades. However, it is far from simple to analyse the results. I think there are better ways of obtaining information. I think use sliders bars and allowing users to indicate using the slider bar their view between two extremes (e.g. between very satisfied and very dissatisfied) is probably better and I will encourage my students to use this technique in the future.

MRes Colour Communication

colour communication

We’re starting a new programme at Leeds University next September. It’s MRes Colour Communication. This is a one-year Masters programme by research but with a twist. There is a taught component in the first semester to get everyone up to speed to make sure they understand the basics of colour communication. They then explore one aspect of this in their research project and submit a dissertation at the end of the year. Please contact me at my University email of for further information or visit

Colour Semiotics – a personal view

Colour is an important component of many successful designs. It is interesting, therefore, to consider why certain colours are chosen in designs and under which circumstances the colour choices enhance the likelihood that the design will be successful. In this paper, four aspects of colour design (colour harmony, colour preference, colour forecasting and colour semiotics) will be briefly considered and one of these, colour semiotics, will be explored in some detail. Finally, the role of all four of these aspects of colour in the design process will be discussed.

Colour Harmony

Colour harmony is concerned with the relationship between colours. One definition of colour harmony is that it refers to when two or more colours are seen in neighbouring areas that produce a pleasing effect (Judd & Wyszecki, 1975). Many theories of colour harmony are ideological in nature and Itten wrote, for example, that ‘One essential foundation of any aesthetic color theory is the color circle, because that will determine the classification of colors’. In the last 150 years, Rood (1831-1901), Ostwald (1853-1932), Munsell (1858-1918), Itten (1888-1967) and others proposed various theories that were based on certain geometric relationships in a colour circle (or more generally in a colour space) being harmonious (Westland et al., 2007). For example, colour combinations whose representations in a colour space form the vertices of a triangle are considered to be harmonious according to some theories. Most of these theories were based on personal introspection and a belief that classical geometric shapes should frame the colour relationships that are harmonious but there is no a priori reason why this should be. Moreover, there have been few studies to robustly test whether theories of colour harmony can be justified empirically. However, when referring to colour harmony it is not always clear that authors are even referring to the same thing. Colour harmony has been used to refer to colours being pleasing, harmonious, and successful. In addition, it is generally accepted that ideas about colour harmony shift over time (Nemcsics, 1993) with fashion and taste and this has led some to claim that “It is quite evident that there are no universal laws of (colour) harmony” (Kuehni, 2005). Nor is it even clear that laws are even required since the majority of designers and artists naturally are able to select colour combinations that are harmonious (by whichever definition) without assistance. It is therefore, perhaps, useful to place colour harmony in the field of aesthetics.

Colour Preference

Colour preference is also best placed in the field of aesthetics but is generally used to refer to a single colour – though the distinction between colour harmony and colour preference is being explored by the work of Ou and colleagues (e.g. Ou et al., 2004b). An early study was carried out by Guildford and Smith (1959) who asked 40 observers to assess the pleasantness of each of 316 Munsell samples according to an 11-point scale (where 0 and 10 corresponded to the least and most pleasant colours imaginable respectively). This study, like most others since, revealed a preference for blue and green colours and a dislike of yellow (on average, of course; individual results usually vary greatly). More recently, 208 participants undertook a simple forced-choice ‘color-picking’ task and the data revealed a robust cross-cultural sex difference (Hurlbert and Ling, 2007) with females’ hue preferences shifted to longer wavelengths when compared with those of males. Hurlbert and Ling suggested the sex differences may be linked to the evolution of sex-specific behavioural uses of trichromacy. Schloss and Palmer also recently studied colour preferences and found that despite, on average, participants preferring yellow hues to blue hues there was considerable variability between individual colour preferences. They proposed an ecological valence theory that suggests that people prefer colours that are associated with objects and situations that are affectively positive for them (Schloss and Palmer, 2010). However, in all of these studies, when observers are asked which colours they prefer it is not clear that they always respond with the same purpose in mind (that is, in what sense or context are the observers judging preference?).

Colour Forecasting

Colour forecasting is a particular phenomenon that relates mainly, but not exclusively, to the textiles fashion and interior design fields (Diane and Cassidy, 2005). It involves the prediction of future colour trends via an appraisal of past colour trends and an assessment of lifestyles associated with these trends. It is not at all clear that colour forecasting is a forecasting or predictive process at all and there is no empirical evidence that colour consumption is influenced by socioeconomic lifestyle factors at all (Stansfield and Whitfield, 2005). Despite this, colour forecasting is an important component in many colour-production industries. It could, however, be argued that colour forecasting should be placed in the field of marketing since the process could be argued to be more about telling consumers which colours they wish to purchase rather than predicting which colours consumers would like to purchase.

Colour Semiotics

Colour semiotics is concerned with the meanings that colours are able to communicate. Colours can evoke strong emotional responses in viewers and can also communicate meanings and or concepts through association. For example, in many western societies black is associated with death and the mourning process. Consequently, colour may play a role in imparting information, creating lasting identity and suggesting imagery and symbolic value (Hynes, 2008). There seem to be at least three different origins for colour semiotics. Firstly there is the emotional or visceral impact of colours. Colours can have a strong emotional impact and can even affect our physiological state. For example, red colours have been cited to raise the blood pressure and colours have been reported to affect muscular strength (Hamid and Newport, 1989; O’Connell, Harper and McAndrew, 1985). We fear the dark. Perhaps these effects are innate and have been present since the earliest days (the effect of red has sometimes been attributed to the colour of blood and our fear of black may relate to a primitive fear of the dark and unknown.) Secondly there are socio-economic origins. In western society purple became associated with wealth and royalty because purple dyestuff was more expensive than gold. Only extremely rich people could afford to wear purple and some organizations (e.g. the Christian church) chose to use purple to make a statement about their wealth and power. Thirdly, some colours meanings are cultural in origin. The association of red with luck in China and the link between pink for girls and blue for boys in western society may originate in and be reinforced by cultural behaviour and shared understanding. For example, in the United Kingdom pink was associated with young boys until about 1920 after which blue came to signify the male professions, most notably the navy (Koller, 2008). The importance of colour semiotics has been noted in corporate visual identities (Hynes, 2008), human computer interaction (Bourges-Waldegg and Scrivener, 1998), political communication (Archer and Stent, 2002), and as a marker for gender and sexuality (Koller, 2008). Koller undertook a study of the colour pink and found, from a survey of 169 participants, that 76 per cent of participants made the association of pink with femininity. Pink was also associated with romance (56%), sweetness (52%), softness (51%), love (50%) and several other concepts (Koller, 2008). Men were less likely to make synesthetic associations for pink than were females who also seemed to have a more differentiated schema for pink. In addition to the link between pink and femininity, Koller (2008) also found emergent associations of pink with fun, independence and confidence. However, although black is often associated with death it can have other meanings; for example it can be associated with power or evil, and the actual meaning in any particular situation depends upon the context in which the colour is used; it can also depend upon other aspects of visual appearance such as gloss and texture (Lucassen, Gevers and Gijsenij, 2010). Furthermore, the meanings for a colour can also depend upon culture and can vary over time. For example, in some countries black is not the colour that is most associated with death (white is used instead). The appropriate use of colour semiotics can impact greatly on the success of a design (particularly one that has a branding or marketing dimension). However, it is clear that colour meanings and associations can vary with a great many factors. On the one hand the connection of meaning and colour seems obvious, natural nearly; on the other hand it seems idiosyncratic, unpredictable and anarchic (Kress and Van Leeuwen, 2002). Indeed, social groups that share common purposes around colour are often relatively small and specialized compared to groups who share speech or visual communication (Kress and Van Leeuwen, 2006). Grieve goes further to suggest that colour per se does not elicit response, but the particular meaning or significance of the colour seems context-bound and varies from one person or situation to another (Grieve, 1991).

Despite the previously discussed context–‐dependence of colour semiotics most robust studies that have explored colour semiotics have done so for colour patches viewed in an abstract sense, devoid of context. The colour science community tend to use the term colour emotion instead of colour semiotics; for example, Gao et al. (2007) wrote that “The semantic words describing words such as “warm-cool”, “light‐dark”, “soft‐hard”, etc.”. The colour science community also tend to study bi-polar pairs of semantic words such as “soft-hard”. In these circumstances it has been found that there is an effect of culture but that it is limited (Lucassen et al., 2010). Indeed, even the medium (e.g. digital display or hardcopy paper) has been shown to have little effect on the emotions or meanings that observers attribute to different colours (Suk and Irtel, 2010). This would seem to contradict greatly with the earlier view (Grieve, 1991) that colour per se (without context) does not elicit response. Nevertheless, most formal studies in the last decade have explored whether there are cultural, gender or age effects in terms of the meanings associated with colours by observers when viewing colours without context (typically square patches of colour viewed on a computer screen). For example, one study (Gao et al., 2007) studied observers from seven countries (Hong Kong, Japan, Thailand, Taiwan, Italy, Spain and Sweden) who were asked to rate 214 colour samples each in terms of 12 bi-polar word pairs (e.g. soft-hard). The differences between the nationality groups were small despite the different cultural backgrounds. In another study (Ou et al., 2004a) 14 British and 17 Chinese observers assessed 20 colours in terms of 10 bi‐polar word pairs. The differences between the responses from the two groups were small with the exception of like‐dislike and tense-relaxed. Chinese observers tended to prefer colours that were clean, fresh or modern whereas this tendency did not occur for British observers. British observers tended to associate tense with active colours, whereas Chinese observers associated tense with the colours that were hard, heavy, masculine, or dirty. In a second study (Ou et al., 2004b) 8 British and 11 Chinese observers assessed 190 colour pairs in terms of 11 bi-polar word pairs. No significant differences were found between the UK and Chinese responses but some gender differences were found; there was poor correlation between male and female responses in terms of the masculine-feminine word pair and female observers tended to like colours that were light, relaxed, feminine or soft (whereas this association did not occur for male observers). It seems clear that colour per se does have meaning but the question of whether these meanings are consistent across culture, age and gender is not entirely clear. As Gage (1999) wrote, “To what extent different colours, such as red or black, have cross-cultural significance, is an altogether more difficult question.” Perhaps one reason why these formal studies have not been able to provide definitive answers to the question of whether colour meaning and emotion depends upon culture (and even gender) is because they have traditionally been carried out with quite small numbers of participants. The two studies by Ou et al. (2004a; 2004b) involved 31 and 19 participants respectively. These studies typically involved small numbers of observers in part because the experiments are carried out in laboratories using carefully controlled and calibrated equipment so that the exact specifications of the colours displayed can be known. One way to involve much greater numbers of participants is to use a web-based experiment and such a study is currently being undertaken by the author (Westland and Mohammadzadeh, 2012). Web–‐based experiments have several advantages including access to large numbers of observers and minimal interruption to observers and experimenter. Of course, the disadvantages are also numerous including potential sources of colour variation including, display technology, ambient illumination level, observer bias an, deficiencies and anomalies and operating software. However, currently responses have been collected for more than 2000 observers from over 50 countries worldwide and this work, when complete, has the potential to allow definitive conclusions to be drawn on the question of whether colour semiotics are invariant to cultural background and gender. The issue of how to address colour semiotics in a design context remains an open question and can currently only be addressed by ad hoc studies that contribute little to the theoretical debate.


Judd DB and Wyszecki G (1975), Color in business, science and industry, 3rd edition, John Wiley and Sons.
Westland S, Laycock K, Cheung V, Henry P and Mahyar F (2007), Colour Harmony, Colour: Design and Creativity, 1 (1), 1-15.
Nemcsics A (1993), Colour dynamics: Environmental colour design, Ellis Horwood.
Kuehni RG (2005), Color – An introduction to practice and principles, John Wiley and Sons.
Ou L-C, Luo MR, Woodcock A and Wright A (2004a), A study of colour emotion and colour preference. Part I: Colour emotions for single colours, Color research and application, 29 (3), 232-240.
Guildford JP and Smith PC (1959), A system of color preferences, American Journal of Psychology, 72 (4), 487‐502.
Hurlbert AC and Ling Y (2007), Biological components of sex differences in color preference, Current Biology, 17 (16), R623‐R625.
Schloss KB and Palmer SE (2010), An ecological valence theory of human color preference, Proceedings of the National Academy of Sciences, 107 (19), 8877-8882.
Diane T and Cassidy T (2005), Colour Forecasting, Wiley-Blackwell.
Stansfield J and Whitefield TWA (2005), Can future colour trends be predicted on the basis of past colour trends?: An empirical investigation, Color research and application, 30 (3), 235‐242.
Hynes N (2009), Colour and meaning in corporate logos: An empirical study, Journal of Brand Management, 16 (8), 545‐555.
Hamid PN and Newport AG (1989), Effect of colour on physical strength and mood in children, Perceptual and Motor Skills, 69, 179‐185.
O’Connell BJ, Harper RS and McAndrew FT (1985), Grip strength as a function of exposure to red or green visual stimulation, Perceptual and Motor Skills, 61, 1157-1158.
Archer A and Stent S (2002), Red socks and purple rain: the political uses of colour in late apartheid South Africa, Visual Communication, 10 (2), 115-128.
Koller V (2008), ‘Not just a colour’: pink as a gender and sexuality marker in visual communication, Visual Communication, 7 (4), 395‐423.
Bourges‐Waldegg P and Scrivener SAR (1998), Meaning, the central issue in cross–‐cultural HCI design, Interacting with computers, 9 (3), 287‐309.
Lucassen MP, Gevers T and Gijsenij A (2010), Texture affects color emotion, Color research and application, 36 (6), 426‐436.
Kress G and Van Leeuwen T (2002), Colour as a semiotic mode: Notes for a grammar of colour, Visual Communication, 1 (3), 343‐368.
Kress G and Van Leeuwen T (2006), Reading images: The grammar of visual design, Routledge. Grieve KW (1991), Traditional beliefs and colour perception, Perceptual and Motor Skills, 72, 1319-1323.
Gao X-P, Xin JH, Sato T, Hansuebsai A, Scalzo M, Kajiwara K, Guan S–‐S, Valldeperas J, Lis MJ and Billger M (2007), Analysis of cross–‐cultural color emotion, Color research and application, 32 (3), 223-229.
Suk H‐J and Irtel H (2010), Emotional response to color across media, Color research and application, 35 (1), 64-77.
Ou L‐C, Luo MR, Woodcock A and Wright A (2004b), A study of colour emotion and colour preference. Part II: Colour emotions for two-colour combinations, Color research and application, 29 (4), 292-298.
Gage J (1999), What meaning had colour in early societies?, Cambridge Archaeological Journal, 9 (1), 109‐126.
Westland S and Mohammadzadeh M (2012),