Computation Is the New Optics: A Conversation With Marco De Mutiis
Marco De Mutiis: We started this platform, this research lab, a framework for the museum to operate in that expands the idea of the exhibition. It’s called [permanent beta]. What we're gonna do is to have two plus years to start researching a given topic. The topic for this first iteration is called 'The Lure of the Image'. So basically, the idea was that we make the curatorial research visible, but we also try to develop new forms of knowledge making. We invited you and other artists, alongside us from the curatorial staff of the
Fotomuseum Winterthur[#1]
, to start thinking about different ways in which we can understand this idea of the seduction of the image. And of course, this idea of the seductive power of the image is something that goes back to the invention of photography. These technical images, they kind of have power over us, they teach us how to see, and reproduce reality. There's many different layers. It's a very wide topic, but we're particularly interested in the specificities of online, digital network images, and how they have specific properties with how they use the seductive powers in different ways. So we've had a look at different possible seductive powers, starting from this idea of
clickbait[#2]
all the way to
conspiracy theories[#3]
, the
political use of images[#4]
. And also, of course, this idea of the representation of the truth. I think this becomes very interesting when we're talking about ideas of computer graphics, and this development of photorealism. This idea of computational forms of images, and image making that attempt to resemble traditional photography and the optic systems of the photographic camera.
We thought that you were one of the people that we really wanted to have, that we needed to have, in this conversation. Because of your practice and the way that you're researching these topics, especially your look at the politics of CGI and the politics of photorealistic images. That's the starting point. We had a brief exchange, and you came up with a very interesting proposal. Maybe you can give a brief introduction?
Simone C. Niquille: There were two versions of a proposal, as it goes. The first one was Parametric Truth, which is an ongoing research looking specifically at the use of computer generated imagery and animation as reconstruction tools for forensic purposes. What I'm interested in is how technology that's been developed for a specific use case in the entertainment industry ends up in the forensics industry, which is very much reliant on truth and measurement. What I mean by ‘these sort of technologies’ is, for example, motion capture, the ability to capture a person's or an object's motion with
sensors[#5]
, or by
analysing video material[#6]
, and then use that data to animate a 3d character in a crime scene reconstruction animation. Mostly, these animations are created within the context of the US court where a jury is present. These re-enactment animations play an important role in visualising events for the jury. With my long term research Parametric Truth, I'm curious to the truthfulness of such animations and images, investigating the default parameters and workflows of the softwares at use. Many of the tools used to create these animations had been developed within the entertainment industry with the purpose to escape, to go beyond, the human body. Think of the making-of videos on YouTube of
David Cumberbatch playing a dragon[#7]
, crawling on the floor in a motion capture suit. Seeing the
same technology applied and argued for in the courtroom[#8]
, a space where accuracy for the sake of truthfulness is of the most importance, is mind blowing and requires a deeper conversation on how these technologies are made and where they are employed. What is a fact-based and what is a seductive image, should there be seductive images in the courtroom?
Duckrabbit.tv[#9]
is the project I am ultimately pursuing as part of [permanent beta]. It is, to some degree, a reaction to the research I have done in the past with Parametric Truth. The use cases that I research are very heavy, loaded with violence and discrimination. By creating images of the research, I do not want to recreate that world. This is why most of the research for Parametric Truth exists as writing and lectures. I was searching for a character that lived outside that world, but would allow me to talk about it. Here comes duckrabbit. The character itself struggles to navigate and exist. In a way, duckrabbit is a lure, a seductive character that enables discussion of complex technologies, about something that is different than it might first seem. As its starting point duckrabbit.tv takes the opinion that computation is the new optics. A point of capture has been erased by a point of computation. This shift brings with itself a whole bunch of questions around processes, agency, representation...some of these questions are already very close to the discipline of photography.
Duckrabbit is based on a drawing by the philosopher
Ludwig Wittgenstein published in his Philosophical Investigations[#10]
to illustrate the concept of ambiguous imagery. Duckrabbit is a drawing that looks like a duck from one perspective and a rabbit from another. The viewers standpoint heavily influences what information is perceived. duckrabbit is trying to live in ambiguity and not be pinned down by a predetermined way of looking that is embedded within computational vision technology. Duckrabbit is placed at the edge where computer vision and computational photography meet: CGI.
MDM: Thanks. I think there's a lot to unpack. I love the idea of using this symbolic figure of the duckrabbit to tackle this core notion of the ambiguity of the image. This idea of perception and interpretation. But before that, I wanted to ask you, well, I guess there's two things. One is to talk a little bit about how you're going to use the platform to visualise this process. But before that, if we think about the example of the courtroom, when did we see the first crime scene reconstruction animations? What are the first or the most important cases that have used CGI to visualise the truth? And possibly convince the jury through the seductive power of the image?
SCN: As far as I am aware, 3d animations appear in the courtroom sometime during the 90s.
There is documentation of use cases in 1992[#11]
. Around the early 2000’s an article is published on
Wired[#12]
on the use of animations in the courtroom. Wired’s interest is in the technology. I think as such, Wired is an important platform to cover the topic as it sits in this in between space of being very geeky and hyped about tech while looking at different places of application from entertainment to healthcare to computation performance comparisons. So, I think animations created specifically for the courtroom appear in the 90s. In terms of the technology necessary to create 3d animations, the timeframe shifts towards the 80s, when
early human animations[#13]
were being developed within university labs.
An important date for me as well is 2006, when
IKEA printed for the first time an image of a chair in their catalogue[#14]
that was rendered rather than photographed. Their idea was to try and see if that
chair[#15]
goes unnoticed, and if successful, to invest in their strategy to replace product photography with rendered imagery. While a completely different field of application than forensic imagery, there is an important point to be made with the IKEA catalog. In the IKEA catalogue, CGI is trying to go unnoticed, trying to be mundane. It's the opposite of the lure of the image. I think that's a bit where we're headed right now, the technology shouldn't be visible. The image, if photographed, rendered, generated, should not ask for attention. The image looks as if it was captured rather than computed. It’s a time of stock photography created with the most sophisticated image technology possible. The image should still inspire you to go and buy that chair. It is supposed to have a certain lure to it. But the technology of how the image was constructed recedes into the background. If we think of what the IKEA chair, a mundane render that simply blends in, might mean for forensic animation, the consequences could be drastic. If it is impossible to distinguish a constructed, rendered image from a photograph, there need to be appropriate protocols of admission and interrogation of software and tools involved. It provokes a set of questions that need to be asked, especially around the definition of accuracy, capture and truthful depiction. I'm wondering, if computation is the new optics, then what are the systems that are driving that?
MDM: I love this idea of this anti lure of the image of the IKEA chair. Because one of the lure’s of photorealism is really the fact that it can simulate photography in a way that is undetected. That brings me back to the 90s and the writing of
Lev Manovich on the paradoxical image[#16]
where he's looking at
Jurassic Park[#17]
as one of the first movies that employs CGI to create the dinosaurs. He's saying that the CGI is actually paradoxical because it's a completely new mode of visual representation. As you said, it's computational rather than optical. But at the same time, it really reinforces, it glorifies, the analogue tradition and the optical systems. So there is a different kind of lure that is in the representational layer but it's more about this form, that it’s trying to pass as photography.
Domenico Quaranta called it body snatcher[#18]
. Like the sci fi movie of the 50s, when it's trying to wear the skin of photography. I think this is already kind of a lure within a lure, that you can have this seduction. And on the content side, there's another seduction of perception, what you were talking about with the jury in the courtroom being convinced by specific images and representation of crime scenes, for example. I think that's also what intrigues me about the idea of the lure, that it goes into many different layers and has many different trajectories. One thing that you mentioned that I wanted to ask you more about is, you said that your research is also focused on the edge between computer vision and computational photography. Maybe you can tell us a little bit more about these two terms, and what they stand for? How do they operate in relation to the image?
SCN: A definition of computer vision would be that cameras don't know what they're looking at. Computer vision is the attempt to instill a certain smartness, a certain capability. If you think of
self driving cars[#19]
, for example, computer vision is used so that the car can ‘see’ what is in front of it. It needs a camera to interpret the world. The language used to describe the abilities of computer vision technology is always very promising. It's the ultimate fantasy that the camera can now see the world, just as a human can, while remaining an objective ‘technical’ observer.
Where CGI comes in is in the training data that is created for some computer vision systems[#20]
. What I mean by a training dataset is that to develop a computer vision technology, a bunch of examples, information of what it is supposed to see, is gathered in huge datasets. For self driving cars, these will be scenes of driving through different situations and environments. Driving through various types of cities, urban areas, forests, anything. Some of this footage could also be captured with cameras of course but CGI is important in creating images that would be difficult to capture. For example, accidents or encountering objects that are less common, like an animal crossing a road. With CGI, you can create a 1000 versions easily and attempt to train a system for all sorts of possible events.
With computational photography, I'm interested in the fact that there isn't so much a point of capture anymore, but a point of computation. So when we think of the cameras in smartphones, they are trying to imitate much larger equipment, like a DSLR camera. Apple famously captioned their ads
“shot on iPhone”[#21]
saying, hey, I can just use a small device, simply press a button, and the end result is the same as what much heavier, pricier equipment and post-processing workflows produces. Apple even tapped into the history of photography, recreating famous images of Yosemite National Park, referencing large format photos taken by Ansel Adams, positioning their technology as a clear continuation of photographic history. Yet, computational photography is fundamentally different than photography. To give an example of its difference, the development of the Google Pixel smartphone camera is interesting. Mark Levoy, who's been in charge of developing computational photography for the Google Pixel,
talks about his inspiration in art historical painting for the image style of the camera[#22]
. With ‘image style’ I mean the kind of mood, contrast, brightness; the default settings that are applied to all images captured with the camera. Each phone model had a different source of inspiration. The Pixel 2, I believe, was inspired by
Caravaggio[#23]
, producing really saturated imagery with dark, deep contrasts and bright objects receding out of this darkness. When you see photos taken with the camera the influence makes sense, it has a specific look to it. The following model was developed with the paintings of
Titian[#24]
as model, who had much softer edges and muted colours. A good case study on how deeply the people developing a particular technology have an influence on the kinds of outputs, in this case images, that are produced. Of course, producing a specific mood and colouring could also be said for different kinds of photographic film. A Kodak Ektachrome ‘looks’ different than a Fuji Velvia. Comes to mind, in the magazine
National Geographic[#25]
the caption used to note the film stock used in a particular photo... But computational photography relies on an array of processes, and they cannot be controlled by the user, like switching photographic film. Oftentimes it isn’t apparent that these processes are taking place. The final image is presented as a capture, not as a result of computation.
Another example is a controversy from 2019 surrounding the launch of a new smartphone by Huawei. The phone was advertised to have a
Moon Mode[#26]
, the ability to take a crisp close up photo of the moon. Many people took photos of the moon and posted them online. The internet thought that all of the images of the moon kind of look the same. A controversy ensued where people thought: maybe we're not taking photos of the moon after all and instead there is a photo of the moon already stored on the phone and at the moment the photo is taken that moon image is slotted in place. I don't think that controversy was ever resolved. People were going through all of the data stored on the phone to see if there was an image. I don't think they could find it. Regardless of the controversy's outcome, the takeaway is people's suspicion in cameras. On reddit someone asked ‘What is the point of taking photos anymore?’
What interests me is if you are capturing the world or computing an estimation. This edge where it's unclear if you're looking at something that's already been captured, and instead of taking another photo of the Eiffel Tower, the ‘camera’ generates a stable diffusion version of all the photos that have ever been taken of the Eiffel Tower instead, a much more beautiful image than I could have taken on this rainy day. But to whose standards of beauty? Is the image taken with the expectations that it has to be dazzling, or is the moment a grainy memory for myself and the image should represent that?
MDM: I particularly enjoy the example of the Moon Mode. Some time ago I was trying to figure out if there had been an ending to the controversy, but it seems to still be shrouded in mystery. That might also be the best ending to it. There's something that I really love about this idea that Huawei just has this moon file that gets composited. Like why not? I mean, that's in the end what you actually want, right? You're not looking for a truthful representation of the moon, most people might just want to have a beautiful picture of the moon in the canon of the photography that we were trained in. My colleague Patrizia also recently wrote something about the Webb telescope and the images it produces. Somebody tweeted an image of a slice of chorizo, presenting it as an image of the Webb telescope, and people were enraged, because they thought it was real. So there is this expected truth in science, especially when we talk about astrophotography. But yeah, I didn't know about all these references of paintings that came up in the discourse of the Google Pixel cameras. And I think that's quite fascinating because isn't it also kind of contradictory that you're creating all of these kinds of photographic simulations, but at the same time, you're very open about the fact that it's a subjective interpretation of reality? It seems to go against this baggage of photography. This tradition of an objective tool of representation which has been criticized over and over, but it still seems to be very strong in the promotion of new cameras that can get the perfect pictures of reality. There's also this friction that I'm very interested in between the state of the art of technology and the development from the technological standpoint that always seems to be pushing for better resolution, better realism, and so on. At the same time, there is this cultural baggage that seems to be very much stuck in the 20th century. I think that was something that
Nicholas Malevé[#27]
was talking about in his research, this split personality within the developer, looking from the perspective of art history of the 20th century, but developing tools that are super cutting edge.
The other thing I was thinking about when you talked about computer vision is your work about the
Roomba vacuum cleaner[#28]
. Computer vision not just piloting self driving cars, but much more mundane forms of computer vision applied in ubiquitous domestic spaces. Your film
Homeschool[#29]
was very humorous in showing the impossibility to objectively define what an image is. At the same time, it also reveals the human impossibility to objectively define the context. I think many of these ambiguities come up in machine visions and at the same time, are very much the duckrabbit of human vision. I was wondering if you can maybe talk a little bit about what the difference is between this machinic vision and human vision when dealing with the problem of perception and how you plan on working with that in your current project duckrabbit?
SCN: In my work I use vision technologies, in this case computer vision and computational photography, as a proxy to speak about the human condition. A grand statement. Yet, technology that's being developed to capture the world is very much driven by standards, by a canon of aesthetics and representation, that have been defined within a Western European context. Look at the kinds of paintings that have inspired the Google Pixel phones. The paintings of Titian and Caravaggio that served as examples for the mood of the images by the Google Pixel freeze time similar to a traditional photograph. What if a non Western canon would inspire computational photography? Just thinking of the representation of time, for example, which does not have to be linear. It would inspire a different way of capturing and representing the world, an exciting possibility to escape the obsession of accuracy and truthfulness photography has been subjected to.
Even though photographic history proves otherwise, over and over. If you look at Edward Muybridge’s work, there's a few portfolios of his where it's very apparent that he composited nature shots with more dramatic clouds, for aesthetic effect. He had
a collection of cloud photographs[#30]
that he would use over and over to create sublime images. In 2014 Google presented a feature to ‘capture the perfect moment’ called
Auto-Awesome[#31]
. Imagine a family photo where everyone is smiling and has their eyes open, at the same time! To achieve this the click of the shutter captures an array of photos, analyses these for smiles and open eyes and creates a composite image where everyone looks ‘perfect’.
Technologies such as computational photography are simply the newest form of an existing practice, but they aren't necessarily new ideas or novel ways of creating imagery. They're very embedded and at the same time contradict a Western paradigm of truth and measurement as tools of power.
MDM: Why are we so obsessed with the truth? I wonder this every time there's a cry out about images not being truthful, when we're talking about deep fakes, GANs and now diffusion models. There's always this argument about not being able to trust images anymore. But why did we trust images in the first place? You know what I mean, you have an answer to that. It's a question I have from time to time.
SCN: What’s relevant is the context within which images appear, who made them and who relies on them. Not having to rely on images’ truthfulness comes with a lot of privilege. This is very apparent in the forensic space where images can influence or veer justice. Someone might rely on representation and documentation through photography. If they're not in charge of the images that can or cannot be captured, the result can be severe. Think of computational photography applying ‘optimal’ contrast to different skin tones, or, like the example of Caravaggio influencing the images produced by the Google Pixel, shadows being completely oversaturated and omitting any information taking place in the dark areas. Compositing one out of several shots, generatively adding information in computational zoom, simulating depth of field, these are all aesthetic processes that do not necessarily fit, or are appropriate, for all images taken with a smartphone camera. Yet, as it is the device the majority carries around at all times, the smartphone is the default capture device for any situation. The ubiquity of the smartphone camera, and as such of computational photography, requires a renewed scrutiny of the terminology used to describe the processes and of control.
Promotional material[#32]
does like to confuse capture with computation for the sake of aesthetic success.
MDM: You pointed out something very interesting in terms of photography and ways of seeing when talking about human agency. And it almost feels like the human photographer has slowly been pushed out in terms of how much they can intervene and how much they are in control of the process of image making. We're talking about a network of stakeholders and the structure of politics and political asymmetries within this black box of technological processes. We can also think of this idea of photography as a
civil contract[#33]
where you have the viewer, the subject, the photographer, the camera. But then when we extend this idea to CGI, even like game players and smartphone users, Twitch viewers, then we have a very complex and often hidden system of relations among these players and stakeholders. In your project and in your research, I know that you're going to interview specific people and try to get insights from particular players or from particular actors in this network. Maybe you can say a little bit who you are planning to reach out to or which kind of profiles you're interested to confront yourself with.
SCN: The interview is always part of my process but there isn't always a place to publish or gather them alongside the work. This is where this platform is super exciting. Mostly, the interviews are conversations. The people I reach out to are, as you mention, from various disciplines and backgrounds. For example a computer engineer who is creating the technologies that I am referencing, someone at a computer vision lab, who is responsible for compiling training data sets. In a conversation like that, I'm interested in the decision making process involved in their task, like, where does the data come from? How is it sorted? What are the challenges of the process? But also other researchers, someone who made a YouTube video, an artist from a blender blog. The people that I will speak to are all involved in different kinds of ambiguities.
Ambiguity is inherently not computable. Computer vision technology cannot function well in unpredictable environments. Yet, the owner of a device with a ‘smart’ camera relies on a level of accuracy. One big problem iRobot faced with the Roomba vacuum cleaner was that it wouldn't recognise dog poop. Instead of avoiding it, or cleaning, the Roomba would just roll over it and spread it all over a person's home. Quite the opposite of the device’s cleaning task. Recently, a big update announced that the device can finally recognise
dog poop[#34]
. Instead of avoiding it, or cleaning, the Roomba would just
roll over it and spread it all over a person's home[#35]
. The product improved but an ambiguity was lost. Quite the opposite of the device’s cleaning task. Recently, a big update announced that the device can finally recognize dog poop. The product improved but an ambiguity was lost. And so it continues with each iteration cycle.
Another element is annotation. What I am eager to explore with the [permanent beta] platform is to publish conversations as starting points for hyperlink deep dives. All the conversations will be annotated, acknowledging that none of these thoughts happen in a vacuum but are part of an ongoing exchange, evolving alongside technological development.
MDM: I just want to take a moment for a shout out to the guy who had to do the data set of dog poo in order to improve the Roomba. I feel very excited about your project as you're going to show through these different interviews and the resulting animations, the different territories and the various stakeholders that these images contain and address. We talked about developers, about lawyers in the courtroom, politicians, designers, viewers, with all of these territories also come so many different understandings of images. All of these people are being seduced in different ways for different ends. I think it's going to be exciting to see how this opens up all the possible ways in which we can understand the role of the image, especially in computational photography and computer vision.
The conversation took place on 29 November 2022 at 10am over Zoom.