Category Archives: advanced stuff

Digitizing Traditional Cultural Designs

A bojagi is a traditional Korean wrapping cloth.

There is currently interest in re-using traditional and cultural designs in modern commercial applications. The bojagi is one of these traditional designs that could be reinvented and hence reinvigorated. But how can a designer create bojagi patterns for use in new digital design?

Working with Meong Jin Shin I developed a software tool that can create a wide range of different bojagi. We identified 8 different classes of traditional bojagi as shown below:

We then created a software tool that would allow a user to create new bojagi which would have the same visual characteristics as one of these 8 traditional classes.

We had some designers in Korea evaluate the tool and they were quite impressed. Although in this study we worked with Bojagi, in fact we were interested in exploring the general method of using digital tools such as this one to allow users to explore traditional designs and to use them in their contemporary design work. The ideas could be easily extended to cover other traditional designs such as tartan. The software could also be added to a package such as Adobe Photoshop as a plug-in.

You can read the full paper that we published here.

Shin MJ & Westland S, 2017. Digitizing traditional cultural designs, The Design Journal, 20 (5), 639-658.

Does context affect colour meaning?

One of the reasons that colour is such a powerful and important property is that it conveys information. Colour imparts meaning. If you see a big red button you may understand that something important or dramatic may happen if you press it. If someone is wearing bright yellow clothes it might imply something about their personality. Take a walk into a toy store and notice the swathes of pink in the girls’ section (though note that I don’t imply that this is a good thing; indeed, I would refer you to the pink stinks campaign in order that you may become a right-thinking person). But it is clear that the manufacturers of the toys believe that the colour pink will indicate that these are toys for girls and that its use may even make girls want to have these toys. If you see two washing-up liquids and one is green and one is yellow you might think that they would smell of apples and lemons respectively before you even open them! Colour sells. And part of the reason that colour sells is that it is informative. Colours have meanings.

But does colour per se have meaning or does colour only have meaning when it is an attribute of a product? The colour red on an emergency stop button may have one meaning but the colour red on the soles on Louboutin shows may have an altogether different meaning. And, of course, colours mean one thing in one culture but another in a different culture; black is commonly associated with death in the West but in China and some other countries in Asia death is more commonly associated with white. Nevertheless, I do believe that colour per se, that is colour in an abstract sense, does have meaning and there are a number of studies out there that tend to support me (though some social scientists, in particular, who would disagree).

What I mean by this is that if we take a culture, such as the UK, then a colour such as red will be associated with various ideas and concepts to varying degrees of strength. Red may take on different meanings when applied to different products (that is, in context). But is there any relationship between the abstract colour meaning and the product colour meaning? This is the question that Seahwa Won (who was a PhD student working with me) and I asked each other that led to a piece of work and an academic paper.

If there is no relationship between abstract colour meanings and  product colour meanings then it might mean that there is little practical or commercial value in studying abstract colour preferences (though it may still be worthy of study). On the other hand, if there is a relationship between abstract colour meanings and  product colour meanings then knowing the former may help us to predict the latter in a wide range of circumstances. To carry out our study we used scaling (I have blogged about some aspects of scaling before) where we try to quantify the perceptual response of participants to physical stimuli. For example, we show people a colour patch on a display screen and then below this there is a slider bar which allows the participants to respond whether the colour is warm, for example, or cool. We do this for lots of colours and lots of participants (nobody said colour science was easy!!) and then we can average these and have a warm-cool scale along which we can place all the colours. When we do this, for example, we find that participants think red is much warmer than blue. However, what Seahwa and I also did was to repeat this type of experiment with different colour products rather than simple colour patches. Would participants place a red toilet roll on the same point on the warm-cool scale as the red colour in an abstract sense? If they would then we can conclude that abstract colour preferences and product colour preferences are related.

We did this for quite a few different scales (warm-cool, expensive-inexpensive, modern-traditional, etc.) and for for a few different colours. The figure below shows the results when we explored the masculine-feminine scale. Look at the left-hand part first, where it says chip along the bottom. Chip indicates the abstract colour meanings (for example, when participants view a simple square or chip of colour). Note that participants scale beige, red and yellow as being feminine colours whereas black, blue and green are more masculine colours. Now look at the right-hand part of the figure, where it says crisps (in the UK a crisp is something you buy in a bag to eat; Americans may call these potato chips). When we showed crisp packets that were differently coloured the masculine-feminine scale values were almost the same as for the abstract colours themselves. We found strong relationships between abstract colour meanings and product colour meanings more often than not.

Our findings are broadly compatible with an earlier study by Taft in 1996 who found that there was no significant effect of context on colour meaning in the majority of cases. We did find some effects of context though. For example, black-coloured medicine was perceived as being more feminine that the abstract colour black itself.

We published this paper in 2016 in the journal Color Research and Application and you can read the paper in full here.

Won S & Westland S, 2017. Colour meaning in context, Color Research and Application42 (4). 450-459.

Consumer Colour Preferences

How does your personal colour preference affect the colour of the things that you buy?
It is well known that people prefer some colours more than others. Personally, I much prefer red to blue. But I am probably in a minority. Many studies have shown that blue is the most popular hue with yellow being one of the least popular hues. But this is when we think of colour in an abstract sense. But what about when colour is applied to a product: a pair of trousers, a toothbrush, a fidget spinner? Well, my favourite colour is red but I have never owned a pair of red trousers. I tend to buy buy blue or brown trousers even though I don’t really like the colour blue in the abstract sense. But are there products where, if we were presented with a choice in colour, we would tend to buy the colour product that matches our abstract colour preference? This is the question that I set out to answer answer two years ago with my colleague Meong Jin Shin. We carried out an experiment over the internet where we presented people with a choice of products in different colours and asked which they would buy given the choice. They were presented with images a little like the one below:

After we asked participants which product they would buy for a number of different products we then asked them what their favourite colour was in an abstract sense (we showed a number of colour patches on the screen and asked the to click on the one they liked best). Our hypothesis was that for some products participants would tend to select products that closely matched their most preferred abstract colours but that for some other products we would not find this.

This is exactly what we found. For some products, such as bodywash, we found that people tended to prefer a particular colour for the product (in this case, blue). The figure below shows the results for bodywash. The rows represent the colour of the products and the size of the circle in each row represents the proportion of people who generally preferred either red, orange, yellow, green, blue or purple that selected that product colour. As you can see below the majority of people chose a blue bodywash no matter what their abstract colour preference was.

However, for the toothbrush product a very different picture emerged. As shown below, people who liked red generally tended to select a red toothbrush and people who preferred purple tended to select a purple toothbrush. For example, 41% of people who preferred green selected a green toothbrush.

So sometimes people’s personal colour preference could be used to predict which colour product they would choose to buy given the choice (and sometimes it couldn’t be). How could this be useful? Well, if we could predict which products where this is true then it would suggest that a multi-colour marketing strategy could be appropriate. Also, imagine you are in a supermarket and you are presented with an offer – 50% off toothbrushes today – and alongside this you see a red toothbrush. If red was your favourite colour then there might just be a little more chance you would accept the proposition. If a supermarket could predict a consumer’s personal colour preference …. [more of this in a later post].

This paper was published in 2015 in the Journal of the International Colour Association. You can read the full paper for free here.

Westland S & Shin M-J, 2015. The Relationship between Consumer Colour Preferences and Product-Colour Choices, Journal of the International Colour Association14, 47-56.

colour physics 101


Download my colour physics FAQ e-book for the Kindle here.

Also available as a physical book from Amazon.

  • What is colour?
  • How does colour vision work?
  • Why is the sky blue?
  • What is the colour spectrum?

The answers to these and many other related questions about colour physics are each provided in a short and easy-to-understand form. Will delight and entertain colour professionals and curious members of the public.

accurate colour on a smartphone or tablet

Electronic displays can vary in their characteristics. Although almost all are based on RGB, in fact the RGB primaries in the display can vary greatly from one manufacturer to another. Colour management is the process of making adjustments to an image so that colour fidelity will be preserved. In conventional displays – desktops and laptops – the way this is achieved is through ICC colour profiles. Colour profiles store information about the colours on a particular device that are produced by RGB values on that device. So to make a display profile you normally need to display some colours on the screen and measure the CIE XYZ values of those colours; you then have the RGB values you used and the XYZ values that resulted. The profiling software can use these corresponding RGB and XYZ values to build a colour profile so that the colour management engine knows how to adjust the RGB values of an image so that the colours are displayed properly. Building a profile often requires specialist colour measurement equipment – though this can often be quite inexpensive now. If you are using your desktop or laptop display and you have never built a profile then you are probably using the default profile that was provided when your display was shipped. The default profile will ensure some level of colour fidelity but particular settings (such as the colour temperature or the gamma) may not be adequately accounted for. If you want accurate colour then you should learn about colour profiling.

It all sounds simple except for the fact that ICC colour profiles are not supported by iOS or Android operating systems on mobile devices. I find this really surprising but that’s how it is for now. Maybe it will be different in the future.

This means that ensuring colour fidelity on a smartphone or tablet is not so straight forward. So what can you do?

Well, there are two commercial solutions to this problem that I am aware of. They are X-rite’s ColorTrue and Datacolor’s SpyderGallery. ColorTrue and SpyderGallery are apps that will use a colour profile and provide good colour fidelity. These are great solutions. Perhaps the only drawback is that the colour correction only applies to images that are viewed from within the app. Having said that, they allow your standard photo album photos to be accessed – but the correction would not apply, for example, to images viewed using your web browser. This is why a proper system implemented at the level of the operating system would be better, in my opinion.

There are two alternatives. The first would be to implement your own colour correction and modify the images offline before sending them to the device. This would not suit everyone – the average consumer who just wanted to look at their photos for example. But it is what I typically do here in the lab if I want to display some accurate colour images on a tablet. But if you were a company and you wanted to display images of some products for example – it might be a reasonable approach. It has the advantage that the colour correction will work when viewed in any app on the device because the colour correction has been applied at the image level rather than the app level. But it does mean you need to do this separately for each device and keep track of which images are paired to each device. This is ok if you have one or a small number of devices but maybe not so good if you have hundreds of devices.

The second alternative would be to build your own app. If you want to do things with your images that you cannot do in ColorTrue or SpyderGallery or if you have lots of devices and you can’t be bothered to manually convert the images for each device, then you could install your own app that implements a colour profile and then does whatever else you want it to do.

Incomplete pair comparison

One of my big academic interests is scaling perceptual phenomena. That is, we take some physical stimuli (for example, a set of sounds of varying intensity/volume) and then we want to know how loud they are perceived to be by people. This allows us to build a relationship between the physical stimulus (in this case intensity) and the perceptual stimulus (in this case loudness). The same idea could be used to scale largeness, smallness, colourfulness, whiteness, lightness, heaviness, sweetness etc. It’s not always a -ness. But it usually is.

There are a great many techniques to scale perception. You can just ask people, for example, to assign a number. For example, you play a sound and ask them to rate how loud it is on a scale, say, from 0 to 100. This is called Magnitude Estimation (ME). It’s a perfectly good technique but it has limitations and one of these is that it can be quite difficult for the participant. And, say, the first stimulus seems really loud and they assign it a loudness of 90; then it turns out that all the subsequent stimuli are louder – then all their estimations will be squeezed in the 90-100 range, which is not ideal. Consequently, in the ME technique we often have so-called anchors – that is, example stimuli at each end of the scale.

An alternative technique is called paired comparison (PC). In this we might have, for example, five stimuli A, B, C, D and E and we present them in pairs and ask the participants which one is louder (or whiter or yellower etc.). The total number of paired comparisons is 10 in this case which is quite manageable. From the results of these paired comparisons it is possible to estimate a scale value for each of the stimuli where the scale value will be an interval scale of loudness (or whiteness or yellowness, etc.). This is a really nice technique and there are quite a few papers that claim that PC is more reliable than ME, for example. However, when the number of stimuli is large the number of pair comparisons becomes huge and the the task is not practicable. When this happens it is possible to undertake so-called incomplete pair comparison where we only present some of the possible pairs to the participants. The question is, however, what proportion of the pairs should be present for the PC experiment to be reliable?

This was the question that Yuan Li and I asked each other during her doctoral research. We undertook a large-scale simulation of a PC experiment. I won’t go into the details here. The method and results have just been published in the Journal of Imaging Science and Technology (JIST). You can see the paper here.

However, I show below the key table from the research which I think might be of interest to other people who are undertaking, or planning to undertake, an incomplete PC experiment.


This table shows the number of stimuli that are being compared along the top. Down the left-hand side are the number of observers taking part. The figure in the corresponding row and column shows the per cent of pair comparisons that need to be carried out to get robust results that would be similar to those you would get if you did the full PC experiment. So, for example, if you 20 samples and 15 participants then you need to half of the possible comparisons. For 20 samples there are 190 comparisons so you would need to 95 of them (which could be selected randomly).

I should point out that there is a caveat that needs to be considered. This work is only valid if the observers can be considered to be stochastically identical. If we ask people to rate samples for loudness, or whiteness, or heaviness, for example, I think this assumption is justified. However, if we were asking people to scale how beautiful people’s face were, for example, – an experiment reminiscent of the early facebook experiment by Mark Zuckerberg – then observers could differ wildly in their judgements. One participant may rate as most beautiful a face that another participant rates as the least beautiful. Because of the assumptions that we made in our modelling we cannot predict the proportion of pair comparisons that would be needed in a case like this. We are thinking about it though.

On CIE colour-matching functions

In 1931 the CIE used colour-matching experiments by Wright and Guild to recommend the CIE Standard Observer which is a set of colour-matching functions. These are shown below for standard red, green and blue primaries. These show the amounts – known as tristimulus values – of the three primaries (RGB) that on average an observer would use to match one unit of light at each wavelength in the spectrum. Why are these so important? Because they allow the calculation of tristimulus values for any stimulus (that is, any object viewed under any light as long as we know the spectral reflectance factors of the surface and the spectral power of the light).


I gave a lecture this week about these and so they are fresh on my mind. I wanted to use this blog post to explain two things about the colour-matching functions that may be puzzling you. The first was stimulated after the lecture when one of the students came up to me with a question. You will note that for some of the shorter wavelengths the red tristimulus value is negative. Hopefully you are aware that no matter how carefully we choose the three primaries we cannot match all colours using mixtures of those three in the normal sense. What we have to do is to add one of the primaries to the thing we are trying to match and then match that with an additive mixture of the other two primaries. The question from the student was, wouldn’t that change the colour of the thing that is being matched? The answer is that it would of course. But it’s ok.

We normally represent this matching with an equation:

S ≡ R[R] + G[G] + B[B]

which simply means that the stimulus S is matched by (that is the symbol ≡) R amounts of the R primary, G amounts of the G primary, and B amounts of the B primary. The values R, G and B are the tristimulus values. I put square brackets around the primaries themselves to distinguish them from the amounts or tristimulus values of the primaries being used in the match.

Now when we add one of the primaries to the stimulus (the thing we are matching) itself, we can write this equation:

S + R[R] ≡ G[G] + B[B]

The new colour, S + R[R], can now be matched by an additive mixture of the other two. Hmmmmmm? You may ask. How does that work? Well, we can rearrange this equation to make:

S ≡ -R[R] + G[G] + B[B]

In other words, matching the additive mixture of the original stimulus S and some red with some green and blue, means that – if it were possible – we could match the original stimulus S with the same amount of green and blue and a negative amount of the red. I appreciate that this is mathematical but I hope that it is maths that anyone could understand. It’s not rocket science. Just simple adding and subtracting. This is how we arrive at the colour-matching functions above. No matter what RGB primaries we use one of them will have to be used in negative amounts to match some of the wavelengths. In practice, this is done by adding it to the stimulus as described above. Of course, you may also know that the RGB colour-matching functions were transformed to XYZ colour-matching functions. These are the XYZ values everyone is familiar with. But that is another story I will devote another post to one day.

The second question though, is isn’t this just arbitrary? If we used a different set of RGB primaries wouldn’t we get a different set of colour-matching functions? Again, the answer is yes, but again it doesn’t matter. The whole point about the CIE system was to work out when two different stimuli would match. If two stimuli are matched by using the same amounts of RGB then by definition those two stimuli must themselves match. If we used different RGB primaries the amounts of those tristimulus values would change, of course, but the matching condition would not. Two stimuli that match would also require the same RGB values as each other to match them, not matter what the primaries were (as long as they were fixed of course). So the key achievement of the CIE system was to define when two stimuli would match. However, it was also useful for colour specification or communication but that does indeed depend upon the choice of primaries and requries standardisation.

I hope people find this post useful. Post any questions or comments below.

The dangers of Likert scale data

Imagine that you want to compare two products A and B and you ask the opinions of 100 users via a survey. The table below shows a summary of the survey and the responses. The numbers under product A and product B show the number of people who gave each of the responses on the left-hand side.


This is known as a Likert scale and this post will give some thoughts on how to analyse these data.

The first thing that is worth mentioning is that there is a simple form of analysis that is relatively uncontentious. This is to say that 60% of people were very satisfied or quite satisfied with product A whereas only 45% of people were similarly very satisfied or quite satisfied with product B. On the one hand this is simple. However, can we use this analysis to say that product A is better than product B? Note one problem straight away, which is that 20% of people are very dissatisfied or quite dissatisfied with product A whereas only 15% of people were similarly very dissatisfied or quite dissatisfied with product B. It seems that product A tends to polarise opinion and it is not clear what conclusions can be drawn.

However, quite often we assign numbers to the categories (such as 5 = very satisfied, 4 = quite satisfied, 3 = neutral, 2 = quite dissatisfied, and 1 = very dissatisfied) and when this is done we can produce a number for each participant’s response; we can then average this to produce the mean values shown in the figure above. According to this we can say that on average the response to product A is 3.6 and to product B is 3.5. Can we now use these numbers to make the following two statements? (1) that product A is better than product B (since 3.6 is bigger than 3.5) and that (2) both products A and B are well received by the participants (since 3.6 and 3.5 are both bigger than 3). What I want to do in this post is discuss the validity of these statements by considering several aspects of Likert scales.

Is it valid to average the numbers?

There is a long-running dispute about whether it is valid to average the scores to produce the mean values as in the table above. To explore this we need to introduce two types of data. The first type are called ordinal data. This is the order in which things are. The Likert scale presented in the table above strictly produces ordinal or rank data. Imagine that three people, Alan, Brian and Clive run a race in which Alan wins, Brian is second, and Clive is third. Knowing the order in which they finished is fine, but it doesn’t tell us whether Alan finished well ahead of the other two or whether, for example, Alan and Brian were involved in a close finish with Clive a long way behind. If, however, we know how many seconds they took to complete the race (Alan = 40 seconds, Brian = 41 seconds, and Clive = 52 seconds) we now know much more information about the race. It turned out that Clive was a long way behind the other two. The race times, in seconds, are called interval data. With interval data the differences between the numbers are meaningful whereas with ordinal (rank) data they are not.

The problem with a Likert scale is that the scale [of very satisfied, quite satisfied, neutral, quite dissatisfied, very dissatisfied, for example] produces ordinal data. We know that very satisfied is better than quite satisfied and quite satisfied is better than neutral, but is the difference between very satisfied and quite satisfied the same as the difference between quite satisfied and neutral? Why am I worrying about this? Because when we assign numbers to the scale (the 1-5 numbers) and then average the responses we are implicitly making the assumption that the scale items are evenly spaced. We are treating the ordinal data as interval data. How can we be sure that the participants treated the scale in this way? Would it have made a difference if we had used satisfied and dissatisfied instead of quite satisfied and quite dissatisfied respectively? So it would seem that is wrong to calculate means from Likert scales. If you click here you will see a post from a PhD student (Achilleas Kostoulas) at the University of Manchester who states categorically that it is wrong to compute means from Likert scale data. I choose this example because it is simply and elegantly explained not because I necessarily agree entirely with his view. It is also worth reading the article by Elaine Allen and Christopher Seaman in Quality Progress (2007) who also take the view that Likert scale data should not be treated as interval data. Interestingly they also suggest some other techniques that don’t suffer from the ‘ordinal-data’ problem; for example, using slider bars to get a response on a continuous scale. However, before you give up detailed analyses of Likert scale data I would urge you to read the paper by Susan Jamieson called Likert scales: how to (ab)use them in Medical Education (2004: 38, 1212-1218). Although Susan is also broadly speaking against treating Likert scale data as interval data she does present the other side of the argument. In another paper, in Advances in Health Sciences Education, Norman (2010, 15 (5), 625-632) argues that the concerns about Likert scales are not serious and we should happily use means and other parametric statistics.

How much bigger do two averages need to be for an effect?

In the table at the start of this article product A and B receive scores of 3.6 and 3.5 respectively. The paragraphs above explain that calculating these means may not be valid. However, assuming that we do calculate means in this way, how different would the mean scores for product A and B need to be for us to conclude that A was better than B? I have come across students (normally in vivas) who would simply state that A is better than B because 3.6 > 3.5. To those students I then would say, would you still take that view if instead of 3.6 and 3.5 it was 3.51 and 3.5? What if it is 3.50001 and 3.5? Would they still maintain that A is better than B? It is clear that we need to consider variance and noise and carry out a proper statistical test to conclude whether 3.6 is significantly greater than 3.5. The test is called a student t-test and anyone can be taught to perform one using Microsoft Excel in a matter of minutes. In the example at the start of this article it turns out that there is no statistically significant difference. We cannot conclude that product A is received better than product B.

However, can we conclude that both products are received favourably? Again, we need a statistical test. It turns out that in this case, both 3.6 and 3.5 are statistically greater than 3 and we can at least conclude that products A and B are received favourably. However, there is the caveat that this assumes that we can treat the Likert scale data as interval data in the first place.

Other considerations

An interesting question is whether we should use 5-point scales at all. Would we get different results if we used a 7-, 9- or 11-point scale? I have found one website that suggests that a 7-point scale is better than a 5-point scale but not by much. A paper by Dawes in International Journal of Market Research (2008: 55 (1)) looked at 5-, 7- and 10-point scales and concluded that the results from a 10-point scale would be different from a 5- or 7-point scale (after suitable normalisation).

Although odd-number scales (with a neutral point) are almost always used. A paper by Garland (Marketing Bulletin, 1991: 2, 66-70) suggest that using a four-point scale (and removing the neutral point) might remove the social desirabiity bias that comes from respondents wanting to please the interviewer. I am not sure what current thinking is on this matter though and I would normally use odd-number scales.

I am not providing any definitive views on these points but rather raising awareness of issues. If you want to use a Likert scale then these are issues you need to familiarise yourself with.

My view

I will confess to having treated Likert scale data as interval data and carrying out parametric statistics (these are statistics that use statistical parameters such as standard deviations). However, deep down I know it is wrong. I am coming to the view that the best thing is not to use a Likert scale at all. I think people often use this sort of scale because it seems simple. There are ways to statistically analyse data like these and I would refer readers to categorical judgement which is a well-used psychophysical technique. My colleague Ronnier Luo at Leeds University has used this technique extensively for decades. However, it is far from simple to analyse the results. I think there are better ways of obtaining information. I think use sliders bars and allowing users to indicate using the slider bar their view between two extremes (e.g. between very satisfied and very dissatisfied) is probably better and I will encourage my students to use this technique in the future.

MRes Colour Communication

colour communication

We’re starting a new programme at Leeds University next September. It’s MRes Colour Communication. This is a one-year Masters programme by research but with a twist. There is a taught component in the first semester to get everyone up to speed to make sure they understand the basics of colour communication. They then explore one aspect of this in their research project and submit a dissertation at the end of the year. Please contact me at my University email of for further information or visit