Color matching algorithm and color transfer for film and photography

CHGEhBf.jpeg


Why patch-based color constancy experiments are structurally flawed​


With this post I want to argue that treating color constancy as a tri-stimulus mapping is not just an oversimplification, but a mischaracterization of the perceptual problem. A key reason this persists is the widespread reliance on patch-based experiments using a single stimulus on a background.

These experiments unintentionally force observers to collapse perceptual dimensions that are otherwise distinct.


The hidden constraint in single-patch paradigms​

In a typical patch-based experiment, an observer is shown a single colored patch against a background and asked to make a match or judgment. Under these conditions, there is simply not enough image structure to support multiple perceptual interpretations.

With only one patch:
  • The visual system cannot reliably separate surface color from illumination or other global constraints.
  • The observer must treat the stimulus as a single fused percept.
  • Any percept of “color cast” is effectively absorbed into the color of the patch itself.
This does not demonstrate that perception is low-dimensional. It demonstrates that the experiment constrains it to be so.


Forced collapse is not evidence of perceptual collapse​

When observers succeed in matching colors across illuminants in patch-based tasks, this is often taken as evidence that the visual system normalizes illumination and represents color in a corrected tri-stimulus space.

But this inference is invalid.

The task demands a single output, so the observer must collapse:
  • surface-related color information, and
  • illumination or color cast related color information
    into one reportable variable.
This is analogous to asking someone to report the “true” color of an object through tinted glass while forbidding them from mentioning the tint. Compliance with the task does not imply the tint is not perceived.


Why multiple patches under a single cast change everything​

When an image contains many patches under the same color cast, the situation changes qualitatively.

Now there is sufficient structure to support:
  • a stable relational organization between patches,
  • the identification of achromatic references,
  • and the perception of a coherent illumination layer affecting the entire scene.
In this case, observers no longer experience “a bunch of oddly colored patches.” They experience a scene with some sort of color bias.

Crucially, they can still identify:
  • which regions are achromatic,
  • which regions are chromatic,
  • and what the color of the illumination or bias itself is.
This reveals that perception is not operating on isolated tri-stimulus vectors, but on a higher-dimensional state that includes both surface and illumination or bias components.


What patch-based experiments actually measure​

Patch-based paradigms do not measure the dimensionality of color perception. They measure the best strategy an observer can adopt under information-poor conditions.

They test how perception behaves when:
  • illumination/bias and surface are deliberately confounded,
  • structural cues are removed,
  • and observers are forced to give a single answer.
That is a valid experimental manipulation, but it cannot be used to justify a model that claims such a collapse is intrinsic to perception.


Implication for color models​

If a model is validated primarily on patch-based experiments, it will inevitably favor low-dimensional mappings. This does not mean the model is correct. It means it is well-tuned to a constrained task.

Once richer image structure is introduced, multiple surfaces, shared illumination, relational cues, the limitations of tri-stimulus mappings become apparent.

At that point, the need for higher-dimensional perceptual representations is no longer theoretical. It becomes empirically unavoidable.


Conclusion​

Single-patch color constancy experiments do not reveal the true structure of color perception. They suppress it.

Only when a multitude of patches is viewed under a shared color cast does the perceptual system reveal what it is actually encoding: not just colors, but the relationship between colors and the light or bias that binds them.

Any theory of color that cannot represent both simultaneously is missing a fundamental part of the percept. This is where my chromatic adaptation model enters the scene. It explicitely models the color percepts as a neutral anchor and a bias that can post-hoc be decribed as the percept of colors through a veil. It estimates a set of stimuli that when combined with the estimated bias or veil results in predictable and stable percepts. Here the model was locally applied to a known variegated field, where the stimuli were predicted that result in expected percepts of uniformly colored red, green and blue rectangles under a variegated bright yellow and dark blue color cast:

UqB8EX5.png


E0GeYqb.png


A theory of color must in my view be judged not by how well it performs under artificially collapsed conditions, but by whether it can represent what observers actually perceive when the visual system is given enough information to do its job. Models that explicitly encode both surface and bias are not adding complexity for its own sake; they are responding to empirical necessity. Ignoring one half of this perceptual state does not simplify the problem. It underrepresents it.
 
Last edited:
I think it’s reasonable to place the veil concept in the same class of perceptual layering mechanisms demonstrated by Anderson & Winawer:

8DM50lx.jpeg


In those examples, identical local luminance values are perceived as black or white depending on which large-scale layer (mist/illumination/interference) the visual system segments out first. Once that layer is discounted, the remaining structure is interpreted with much lower configurational ambiguity.

A color cast functions similarly, but as a continuous field rather than a manner of structured noise pattern. That actually makes it simpler to model mathematically, but not fundamentally different in kind. The same mid-grey surfaces can be interpreted as black or white depending on whether the (a)chromatic veil is segmented as an overlay or as part of the surface:

uc38YLC.png


Framed this way, the veil isn’t a semantic or a post-hoc explanation, but a pre-semantic latent layer that helps explain color and lightness constancy, and why identical stimuli can produce different percepts under different global configurations:

QOCIWPt.png


The Anderson & Winawer mist is much more complex to segment computationally as a layer, but the concept is very similar. First segment the interference layer (black mist for the top image and white mist for the bottom), and subsequently semantically interpret what is behind the interference layer: white chess pieces for the top image, and black chess pieces for the bottom.

A key implication of both the color cast veil layer concept and the Anderson–Winawer examples is that from a modelling perspective semantic segmentation alone is obviously insufficient to arrive at the color percepts. If you apply a semantic-first segmentation to both images, the chess pieces resolve to the same mid-grey values in both cases. There is no semantic cue at that stage anymore that justifies black versus white. Only after a large-scale veil layer (color cast/mist/illumination/interference) is segmented and discounted does the percept flip: the same mid-grey pieces are then interpreted as black in one configuration and white in the other. This strongly suggests that veil-like layers (local and global) are pre-semantic and must be segmented before object identity or surface reflectance can be stably inferred. Black and white are not directly “seen” properties here, but derived once the local or global layer has been factored out.

Images used for research and educational purposes.
 
Last edited:

Deriving analytical expressions for “perceptual” black, mid-grey, and white under a uniform-field model in sRGB​


The basis for the images with colored squares is this stimuli distribution:

srgb ∼ Uniform(β, β+σ)

implemented in MATLAB as:

srgb = rand(1,3)*sigma + beta ([0 0 0] < srgb < [1 1 1])

where:
  • β is a bias (the “veil” / cast / mean-field offset),
  • σ is a range (the spread of local stimuli around that bias),
  • srgb is a stimulus value (per channel, or per some reduced achromatic coordinate).
The key perceptual concept is to treat black and white not as fixed stimuli, but as inferred reflectance extremes estimated from the local stimulus ensemble. In the simplest “mean-field” reading, those extremes correspond to the lower and upper bounds of the local distribution.

This post derives the analytical expressions.



Black/mid-grey/white as distribution anchors​


If we accept the model:

srgb ∼ Uniform(β, β+σ)

then the distribution support is literally:
  • Minimum possible stimulus: β
  • Maximum possible stimulus: β
So the most direct analytic mapping is:

Black = β and White = β

Now: what should “mid-grey” be?

For a uniform distribution, the expected value and the median coincide at the midpoint:

E[srgb] = β+σ/2 and median(srgb) = β+σ/2

So mid-grey has the clean analytic form:

Mid-grey = β+σ/2

Consequently within this toy example any stimulus value can play the role of black/mid-grey/white depending on the local inferred μ and σ. The only values for which perceptual black, mid-grey and white align with display black, mid-grey and white are β = [0 0 0], and σ = [1 1 1].



A few examples​


Below are five examples. where β and σ have been randomly chosen to produce a random field of squares. The first image shows the predicted stimuli that should produce percepts of black, mid-grey, and white. The second shows these stimuli in the context of the field:

3lUvyhr.png


NUTnwir.png


rMTGScb.png


fzqRpXs.png


eX0z7fz.png


dpS0QuZ.png


8sc6Tbg.png


CiTkCJs.png


TyfHBN1.png


mSzPwDH.png
 
Last edited:
The above toy example can be viewed as a special case of my mean-field approach, where percepual black and white are represented by the the extrema, and mid-grey the central tendency of a global or local Plook, which have rigourous definitions if:

Plook ∼ Uniform(β, β+σ)
 
Here are a five more examples now with the numbers:

Example 1:

β = [0.0356 0.2066 0.2053], σ = [0.6052 0.5794 0.6726]

perceptual black = [0.0356 0.2066 0.2053], perceptual mid-grey = [0.3382 0.4964 0.5417], perceptual white = [0.6408 0.7861 0.8780]

KxKdhMy.png


8GTEa4e.png


Example 2:

β = [0.0179 0.4925 0.0084], σ = [0.6243 0.1944 0.5058]

perceptual black = [0.0179 0.4925 0.0084], perceptual mid-grey = [0.3301 0.5897 0.2614], perceptual white = [0.6422 0.6869 0.5143]


A89S8wG.png


VmIsRVI.png


Example 3:

β = [0.4129 0.3958 0.3694], σ = [0.2916 0.1816 0.2328]

perceptual black = [0.4129 0.3958 0.3694], perceptual mid-grey = [0.5587 0.4866 0.4858], perceptual white = [0.7045 0.5774 0.6021]


INX0HSg.png


7hdmyjk.png


Example 4:

β = [0.3524 0.0091 0.3905], σ = [0.4718 0.0881 0.4547]

perceptual black = [0.3524 0.0091 0.3905], perceptual mid-grey = [0.5883 0.0531 0.6178], perceptual white = [0.8242 0.0972 0.8452]

yDm7sOT.png


GSfAU5w.png


Example 5:

β = [0.6713 0.5532 0.3499], σ = [0.2242 0.1710 0.0040]

perceptual black = [0.6713 0.5532 0.3499], perceptual mid-grey = [0.7834 0.6387 0.3519], perceptual white = [0.8955 0.7242 0.3539]

7ygIKFa.png


xU9d0WE.png


So, why does this simple model appear to work in sRGB and will likely work in any gamma corrected color space? Because the extrema and central tendency of a distribution are largely invariant under monotonic transforms. That is why I can state with some confidence that perceptual black and white represent the global or local extrema in a gamma corrected distribution, while mid-grey represents its central tendency.
 
Last edited:
The Helmholtz–Kohlrausch effect is a perceptual phenomenon in which the brightness of a chromatic stimulus appears to increase with its chroma or saturation:

Helmholtz-Kohlrausch_effect_visualized_improved.jpg

In the above images the colored patches have the same luminance as the achromatic patches, but the chromatic patches appear brighter, particulary red and magenta. I have implemented a model for the Helmholtz-Kolrausch effect, that is based on the veil concept. Simply put the model assumes the effect is a form of statistical normalization, where a stimulus patch is interpreted as an achromatic surface with a chromatic veil. Here luminance extrema are always interpreted as white. These extrema are assumed to be determined by the upper bound of luminance associated with a chromatic subspace. To keep the model simple I have used the luminance upper bounds associated with the RGB color space as a proxy. The lightness behind the chromatic veil is then simply the luminance associated with the stimulus patch divided by the upper bound associated with the stimulus. The lightness of the veil is assumed to be the lightness associaterd with the stimulus patch. The perceived brightness is then the perceptual average of the two assuming the percepts of the veil and the achromatic color behind the veil are collapsed in the patch based regime. Using this model we can determine the effective lightness associted with each patch in the above picture:

HKE1.jpg

The estimated lightness correlates very well with the perceived brightness in the original picture. Since the model assumes scaling of brightness is only related to the extrema, the lightness ratios can be used to estimate luminance values that will be perceived as equiluminant. Here's an example where the normalized luminance is 0.75:

HKE2.jpg

Here's an example where the normalized luminance is 0.5:

HKE3.jpg

The model appears to accurately capture the effect as the stimuli patches now look approximately equiluminant even though the RGB bounds were used as a proxy for normalization.

Images used for research and educational purposes.
 
Last edited:
In a previous post I argued that patch based experiments force an observer to collapse a higher dimensional state variable, interpreted in the model as seeing a color behind a veil, to a tri-stimulus value against an achromatic field. While the higher dimensional state variable is an information richer percept, the tri-stimulus value can be viewed as visual perception matching perceived lightness and subsequently making a compromise between two chroma representations intrinsic to the information richer percept:
  1. The estimated intrinsic surface chroma
  2. The estimated surface chroma attenuated by the chromatic veil
The two cannot be reconciled in a single tri-stimulus value, and so the most logical outcome is to split the difference between the two, making the perceptual average the closest tri-stimulus representation. To test this hypothesis I have created six examples, where we start from our neutral anchor with a set of random test patches and an achromatic gradient. I subsequentky estimate the stimuli that will produce the percepts behind a color cast. These stimuli are then placed in context. Finally I use the above perceptual average to generate a second set of stimuli that match the stimuli in context when placed against an achromatic field:

PqrtzEo.png


GB7iv3X.png


w7ahnio.png


DoK0uHA.png


lLd7O5L.jpeg


3Vh9wXx.jpeg


The model appears to very accurately represent the collapsed tri-stimulus percept.

Images used for research and educational purposes.
 
Here I present compelling evidence using two examples that color perception is distinctly non-local and how conversely global (large scale) image statistics often drive what we perceive. For both images I have selected a relatively large section of the image and predicted the collapsed tri-stimulus percepts for the entire section. In both cases these predictions are highly accurate. Now for the key observation: the color percepts for the predicted sections consisting of a wide variety of stimuli and forms is virtually identical, but the actual stimuli are most definitely not as can be seen when we inspect the stimuli samples taken from the images. This can mean only one thing our perception is not driven by local context or forms, but by the global field, which has a consistent veil layer or color cast in the original image, and is largely achromatic in the other:

NScB4NI.png


2KQg7tM.png


Color perception appears fundamentally non-local, and is thus often dominated by inferred global image statistics rather than local stimulus properties.

Images used for research and educational purposes.
 
Last edited:
These results might be interesting from a color science perspective, but what about the practical use of these predictions for color grading? As it turns out the results have some relevance, and show how models such as this can capture aspects of visual perception in measureable data. To explore this let us consider a frame from two different films with color casts: "Joker (2019)" and "The Wonderful Story of Henry Sugar (2023)", where we will first consider the latter. The below picture shows six images. The first row shows the original frame with a set of randomly chosen stimuli patches and a perceptually achromatic gradient on the left. The model is then used to predict the collapsed color percepts of the stimuli patches and gradient on the right to once again highlight the accuracy of this approach in a practical color grading setting. The second row represents color as a higher dimensional state variable, where we now just focus on the actual frame in question. The left image represents the veil layer or color cast, while the right image represents the color behind the veil layer (where lightness has been kept constant for the sake of the comparison, which focusses on chroma). So, these two images represents an information rich representation of color, whereas the previous two images focussed on an information poor representation of color: the collapse of the two layers to a tri-stimulus estimate. The final row of images shows the difference between the raw stimuli when viewed in isolation versus what we perceive represented by a tri-stimulus estimate, where I have isolated the skin tones. It becomes immediately obvious that when the skin tones are detached from their surrounding context, the color of the faces become jaundiced. The stimulu do not represent the perceived skin tones. It is here that the estimated perceptual tri-stimulus values add value, because unlike the raw stimuli, these estimated color percepts are an accurate representation of the perceived skin tones:

9YzR1l8.jpeg


I have repreated the same analysis for the frame of "Joker (2019)", which has an even stronger color cast, and the results and conclusions are the same. The skin tones of the girl turn a sickly green when viewed in isolation, whereas the estimated perceptual tri-stimulus values once again align very closely with the perceived skin tones:

6GcPV1a.jpeg


These examples again reinforce the importance of global image statistics and how our perception can be modelled to a good degree of accuracy for practical situations with highly variegated pictures.

Images used for research and educational purposes.
 
Last edited:
Can a color be both blue and achromatic at the same time? The below example would suggest so. Here a cyan to red gradient and a number of color patches is presented under a red color cast. Next we may ask ourselves which stimuli against an achromatic field most closely resemble the cyan to red gradient against a red field? The model predicts that the closest representation is a grey to red gradient, resulting in an alleged bi-modal color percept, where the gradient is cyan to blue when viewed in the context of the red field, but closely resembles grey to red, if we are forced to map to a stimulus in an achromatic field:

4MmvMI1.jpeg


Images used for research and educational purposes.
 
Last edited:
To showcase the validity of this model for predicting color percepts under a color cas, here are four sets of images with randomly selected color patches under different color casts. The model is subsequently used to predict the collapsed tri-stumulus color percepts against an achromatic field:

mZrWp9j.png


eOCEhXI.png


PpIeLiE.png


nXl7d7I.png
 
To summarize what this color appeararance model predicts, I have created four example collages consisting of six images. This model has been developed to predict color percepts under a global color cast, but as shown in previous examples it can be extended to locally varying color casts. Let's go through each of the six images:

1) Top left image: The original picture with a color cast.

2) Top right image: The estimated color cast after statistical segmentation. This veil layer has special significance, because it represents perceptual achromacity in the model. The stimuli associated with the veil layer or color cast are predicted to be perceived as achromatic behind the veil.

3) Middle left image: The predicted color percepts behind the chromatic veil layer, where the chroma component of the veil layer has been removed, but not the lightness component remains.

4) Middle right image: The estimated color percepts behind the chromatic veil layer, where the entire veil layer has been removed. This is the neutral perceptual anchor within the model.

5) Bottom left image: Isolated area of raw stimuli with most of the context replaced with an achromatic field.

6) Bottom right image: Isolated area of tri-stimuli color percepts estimates with most of the context replaced with an achromatic field. This image is the model representation of traditional perceptual color matching experiments, where observers are tasked to collapse the higher dimensional state variable that represents color in this framework to a most similar tri-stimulus estimate against an achromatic field. Color matching within this framework is thus explicitely postulated and modelled to be a measurement protocol with known information loss.

leGfbzu.jpeg


ArgzR8g.jpeg


kTgRxTV.png


01COTbE.jpeg


Just to be clear this is a phenomological model of color perception. It does not claim or aim to describe the mechanisms of visual perception. Statistics are used here to predict latent model variables that could correlate with latent perceptual states, where the tri-stimulus predictions can then be compared to matches created by a group of observers. However, it should also be noted that the veil layer/behind the veil concept has a nice symmetry with the mechanism of lateral inhibition implementing a relative coding of context. The fact that we perceive the color cast as a separate percept suggests the magnitude of inhibition may itself be part of the signal forwarded upstream. This does not imply the veil layer is explicitely encoded perceptually, just that perceptual outcomes correlate with such a phenomological construct under constraints.

Images used for research and educational purposes.
 
Last edited:
Note that I have slipped in a key postulate of the framework in the above examples, namely that chroma perception in the mean-field regime is entirely dependent on the global context in the form of Plook and no longer determined by local interactions. The postulate's likely validity has been confirmed with the experiments, where the predicted collapsed percepts against an achromatic field closely match the percepts in their original context despite ignoring local chromatic relationships. These results align with reports in the literature that suggest simultaneous contrast is strongly suppressed in highly variegated contexts. This framework makes these ideas explicit by defining a mean-field limit, where chromatic adaptation is solely determined by global context, where global is defined as large scale compared to the frequency of the whole or part of the image or the field of vision. It also seems reasonable to suggest that this mean-field regime may overlap significantly with natural images and everyday visual perception, where scenes are often highly variegated.
 
Last edited:
It is interesting to note that it appears natural images converge to a mean-field result fairly rapidly. Here are two examples (both demonstrations of Prof. Akiyoshi), where I show four images:

1) Top left image: The original picture with a color cast.

2) Top right image: The estimated color cast after statistical segmentation. This is the bias our visual system is assumed to infer, such that it can determine an object's "true"color.

3) Bottom left image: The predicted color percepts behind the chromatic veil layer, where the chroma component of the veil layer has been removed, but the lightness component remains.

4) Bottom right image: Isolated area of tri-stimuli color percepts estimates with most of the context replaced with an achromatic field. This image is the model representation of traditional perceptual color matching experiments, where observers are tasked to collapse the higher dimensional state variable that represents color in this framework to a most similar tri-stimulus estimate against an achromatic field.

8JMoPnP.png


e8nrxGW.png


The latter example is particulary interesting, because the perceived yellowish color of the train would traditionally be attributed to the surrounding blue trees, meaning it is assumed to be a form of simultaneous contrast. However, here the yellowish color of the train is derived from the model predicting that the visual system infers an overall blue cast to the image. The tri-stimulus estimates derived from this model very closely align with the actual percepts in the original image. That is not to say that the model result implies a simultaneous contrast based approach could not give a similar outcome. Such a result would create a direct link between traditional local models and this mean-field approach.

Images used for research and educational purposes.
 
In physics and probability theory, mean-field theory (MFT), also called self-consistent field theory, is a framework for studying complex, high-dimensional stochastic systems by approximating them with a simpler model. Rather than tracking every microscopic interaction, MFT averages over many degrees of freedom, which is especially useful for systems composed of many interacting components. The core idea is to replace all interactions acting on a given component with an average, or effective, interaction (often referred to as a molecular field). This turns a difficult many-body problem into a much simpler effective one-body problem. Because these simplified models are much easier to solve, MFT often provides useful qualitative insight into system behavior at a far lower computational cost.

A famous example of a mean-field result is the theorem (or principle) of corresponding states, originally developed by Johannes Diderik van der Waals in the 1870s. The theorem states that all fluids, when compared at the same reduced temperature and reduced pressure, have approximately the same compressibility factor and deviate from ideal gas behavior by roughly the same amount. In this approach, material-specific constants are eliminated by rewriting the equation of state in terms of reduced variables, defined using each substance’s critical temperature and pressure. For fluids that obey the van der Waals equation of state, this leads to universal predictions, most famously a compressibility factor of 3/8=0.375, which is known to overestimate the value observed in real gases.

The idea was later popularized and generalized by Edward A. Guggenheim, who emphasized that very different systems can exhibit nearly identical behavior when examined near their critical points. Many non-ideal gas models satisfy this principle, including the van der Waals model, the Dieterici model, and other real-gas equations of state.

My model for color perception is strongly influenced by the above principles. It is a mean-field theory, because it replaces many local interactions with a large scale global field, that can be directly derived from image statistics. In doing so it loses explanatory power with respect to mechanistics, but through this process of abstraction it gains something most mechanistic models struggle with: generalization. Highly mechanistic models often have many parameters that have to be tuned to specific tasks.

In addition to being a mean-field theory this framework has attempted to provide a new perspective on color constancy by postulating that color constancy may be considered a corresponding states principle: a many to one mapping. Color itself is subsequently defined as a state variable. A while ago I shared this example:

2KcyGcN.jpeg


I argued that these four images can be considered corresponding states, where the model observer perceives the same colors behind the color cast, which I post-hoc described as a chromatic veil. Below this process is explained in a visual sense through the model:

nh7QUx5.jpeg


At the top we start with three images with different color casts. Within the framework it is assumed observers infer the color bias in each image, and subsequently infer the true color and lightness of objects relative to this baseline. The second row of images shows the model's estimate of this baseline, the chromatic veil. The third row of images shows the color behind the veil: the chroma relative to the veil layer. The bottom row of images shows the color behind the veil layer after lightness scaling, revealing three virtually identical images. These images, that I dubbed the neutral anchor images, fill the same conceptual role as the reduced properties in the Van der Waals theorem of corresponding states.

Unlike most existing models for color perception this model explicitely considers the color cast/veil layer as part of the percept. This is not traditonally how color is defined under varying illuminants. It is generally assumed the illuminant is discounted, revealing a bijective mappings between stimuli under different illuminants. In a previous post I have described this process as observers being tasked to collapse the higher dimensional state variable that represents color in this framework to a most similar tri-stimulus estimate against an achromatic field. As stated before color matching within this framework is thus explicitely postulated and modelled to be a measurement protocol with known information loss. Below thismeasurement protocol is explained in a visual sense through the model:

s74uHys.jpeg


As before we start with a top row of images each with a different color cast, where I have now added the color patches as a callback to the matching experiments that are often used to quantify color percepts. Within this framework color matching of percepts is modelled as a form of perceptual averaging, quite literally collapsing the veil layer and the color behind the veil to tri-stimulus estimates. These estimates are shown for the colored patches in the second row of images. A percept of satured yellow behind a reddish veil layer or a bluish veil layer become a fairly saturated orange/yellow and a greenish yellow for example. In their own way these estimates are accurate representations of what is seen in the original images and probably easier to interpret is similar looking from a color matching perspective, but with a loss of information. The orange/yellow against the achromatic field is no longer yellow behind a red cast, and the greenish yelllow no longer the same yellow behind a blue cast. They are no longer corresponding states, but projections on a lower dimensional space. The bottom row of images shows that the prediction of collapsed percepts on a lower dimensional space is not restricted to just patches and gradients, but also far more complex percepts in the variegated fields associated with natural images.

Finally, I would like to circle back to the subject that started this thread: color matching and look transfer. Some argued that the methodology represents an image processing hack, that while occasionally technically useful ultimately represents a misguided and naive stimuli mapping that has few links to color perception. I believe that I have shown that color matching and look transfer as performed through Plook is inherently a phenomological perceptual model part of a larger family of perceptual operators designed around Plook, which is a latent perceptual feature within this framework. In fact I would argue that in hindsight look transfer as developed within this framework represents a scientifically novel point of view on color perception, because rather than focussing on individual patches of color or individual objects, the method attempts to answer the question: which latent variables and relationships describe the perception of a visually coherent macro-look across multiple (moving) images?:

OCbGdWT.jpeg


Images used for research and educational purposes.
 
Last edited:
With this post I want to summarize the scope of this mean-field framework for color perception. We begin with a photograph of Cinque Terre that I have used before. I have added a strong color cast:

yLXsAiX.jpeg


The entire framework is built around the latent feature Plook, a stimuli distribution that describes the macro-look of an image. The simplest operation that can be performed using Plook is color balancing, which is performed without a reference, where it is postulated that a generalized grey world hypothesis holds in the mean-field regime:

4tq6sZ0.jpeg


This operation provides a result that is quite similar to the input image. The novelty here is that the grey world hypothesis, which is a mean-field concept in of itself, is leveraged through the composition agnostic nature of Plook. It is important to note that the grey world hypothesis in this instance is applied at the stimuli level. The average of a subset of stimuli is assumed to be achromatic. A similar principle is applied for modelling chromatic adaptation, but here it is applied at the perceptual level:

qVOk78j.jpeg


Here the second and third images represent the now often discussed statistical decomposition of color cast/veil layer and the color postulated to be perceived behind the veil, which together form a higher dimensional state variable. The final image shows the most recent emergent result of this framework: the postulated closest stimuli match when the higher dimensional state variable is collapsed to a tri-stimulus estimate against an achromatic field.

Since Plook represents the macro-look of a picture it can be used to transfer a look from one image to another even if they are not identical. Below are two examples using the image with the color cast as the source:

TzYwfUF.jpeg


JBlfbnl.jpeg


Within this framework it is Plook and Plook x Plum that encode perceptual properities like contrast, saturation, brightness, etc. These properties can themselves be considered lower order projections of the higher order state variable that represents color in this approach. While look-transfer might be viewed as an image processing operation, I believe it represents an essential often overlooked aspect of color perception, that cannot be found in patch based experiments, namely the relationships between colors, that not only shape color perception as the container term context, but as an essential part of color perception representing coherence, consistency, and aesthetic.

Images used for research and educational purposes.
 
Here is another example that clarifies the two modes of perception modelled by the approach:

wCM3MSr.jpeg


The original top left image depicts a color checker with a cyan color cast. The model performs a statistical segmentation, where the top right image depicts the estimated color cast or veil layer, and the bottom left image depicts the estimated percepts behind the veil. For example the third patch from the right on the top row is predicted to be perceived as achromatic behind a cyan color cast. This percept cannot be captured by a tri-stimulus value. It requires at least six values: three values that define the percept behind the color cast, and three more that define the color cast. This is not the colorimetric way these percepts are usually defined. Usually an illuminant is estimated after which a 3x3 matrix is often used to estimate the colors under for example a D65 illuminant. By contrast, my argument is, that the color cast or veil layer is an integral part of the percept. However, if we were to want to represent the percept as a tri-stimulus value, the approach models a second mode of perception by asking the question: if we collapse the two layers of perception to a lower order projection, which tri-stimulus value against an achromatic field is most similar to the the higher dimensional percept. This result is depicted in the bottom right image for a number of patches and the small bag on the right.


Images used for research and educational purposes.
 
The color appearance model allows for a specific perceptual task: color blending when a color channel is missing. By removing a color channel we further constrain the color matching algorithm. In fact in RGB mode the algorithm can no longer perform reliably at all, but in perceptual mode it will do two things simultaneously:

1) Transfer a look from a full color image to an image with the missing color channel.
2) Find the optimal blend of stimuli that minimize the perceptual loss between the source and reference.

Here are three examples. For the first the red channel is missing:

Source:

6mqTSwT.png


Reference:

DHwTvOz.jpeg


Color matching result:

tzTHrBi.jpeg


For the second example the green channel is missing:

Source:

OFHNtUE.jpeg


Reference:

yLLmkSx.jpeg


Color matching result:

NfOH1m2.jpeg


For the final example the blue channel is missing:

Source:

BeUlQ0g.png


Reference:

mFf0aD1.png


Color matching result:

4qxqm7W.png


These examples show that a color appearance model, that is grounded in perceptual statistics can produce perceptually plausible color matches, even when a significant portion of the data is missing.

Images used for research and educational purposes.
 
Last edited:
For those interested in the status of the availability of the color matching algorithm, we have been working on a web based application these last few months, which is 90% done. Our intitial aim is to use it as a demo for attracting interested parties and for testing. In the hopefully not too distant future we would like to offer the algorithm in the form of plugins in software environments like Davinci Resolve, Premier Pro, etc. I will keep you posted on these developments.
 
I have been monologuing quite a bit on this forum, so I think I should take a little break for a little while. However, for those that would like to know what all these posts have been about, I will write a couple of summarizing posts starting with this one.

About eight years ago I was part of a group of fans that attempted to restore faded film scans. Being amateurs with little experience in color restoration, we did the best we could, but with results of varying quality. I have been a data scientist for most of my adult life and also have an academic and professional history with fundamental modelling and Monte Carlo simulations in the field of polymer physics and polymer dynamics, so I decided to look into color matching algorithms that might help speed up the process and provide more accurate results. I ended up writing my own algorithm inspired by histogram matching, and initially only meant for color matching between a source and reference with identical content. Despite its relative simplicity this algorithm is extremely effective for the purpose for which it was developed:

Faded 35mm source:

JITKkCv.png


DVD reference:

0017YWh.jpeg


Color matched 35mm source:

FZk8s4b.jpeg


I created a simple app in MATLAB, that I distributed for free, and a few dozen people have since then used it to color match faded scans or regrade their favorite movies to match the look of whatever reference they preferred. I would get a request for the app every now and then, and I went on with my life.

Then about eighteen months ago having been active in quantitative finance for a few years, I was looking for something different to get involved in. Color matching algorirthms were being introduced based on artificial intelligence, that aimed to color match stills that are not identical. So, I started thinking, that there must be a more fundamental first principle based approach for color matching rather than training a neural network on thousands of images. So, I went back to my original method and began by considering the question: What if I have the same scene, but from a different perspective? How would that work? I came up with a concept, that appeared to be fit for purpose:

Source:

m11ROVj.jpeg


Reference:

JJf2tD7.jpeg


Color matching result:

pRQMMMD.png


A production photo might not be the most ideal or accurate reference, but it seemed to work...

What followed was a number of extensions of the algorithm to expand its scope to include images with similar content that are not necessarily from the same scene or location. Here are a few examples:

Source:

fkF2Cob.jpeg


Reference:

PMFmyBa.jpeg


Color matching result:

BRIQPQs.jpeg


Source:

ygune3h.jpeg


Reference:

i9vT3TI.jpeg


Color matching result:

fEjmD3z.jpeg


Source:

1vFIGuI.jpeg


Reference:

DmCzXcN.jpeg


Color matching result:

TgWr1i0.png


Simultaneously, I began to build a conceptual framework for color perception that centered around the idea that under the assumption of a single illuminant the stimuli distribution of an image Pimage(i, j, λ) at location i, j and at scale λ can be split into three distinct distributions:

Pimage(i, j, λ) = Plook x Plum(i, j, λ) x Pcomp(i, j, λ)

Here Plook represents the macro-look of the image. Plum(i, j, λ) represents the local luminance distribution, and Pcomp(i, j, λ) represents the local compositional distribution, which together form the local stimuli distribution of the image. For very large λ we recover the global image statistics, which are again written as:

Pimage = Plook x Plum x Pcomp

Plook x Plum(i, j, λ) is postulated to converge to the global Plook x Plum for a scale smaller than the image size. In other words Plook x Plum(i, j, λ) becomes self-similar for large λ. This postulate was supported by evidence, where I showed the left half and right half of a high entropy image converge to the same Plook x Plum. These were the first building blocks of what eventually became a full blown color appearance model rooted in a theorized mean-field regime of color perception.

Meanwhile as discussions with colorists and other professionals fueled further development, the color matching algorithm also turned out to be particulary well suited for color restoration, such that it is now part of HS-ART's DIAMANT-Film Restoration Suite. Here's an example:

Source:

BJpmaKn.png


Reference:

M0G3QoX.jpeg


Source with restored colors:

djANNnO.png


Given all this we can state, that if human observers can reliably judge that two images share a “look” despite dissimilar content, then appearance must factor through a latent variable that is invariant to composition but sensitive to global statistics. This latent variable is represented by Plook in this theoretical framework.

To be continued....

Images used for research and educational purposes.
 
Last edited:
Back
Top