Real or Artificial

date
December 8, 2025
category
Artificial Intelligence
Reading time
7 Minutes

How science is learning to tell the difference between photography and AI-generated images and video

For nearly two centuries, photographs were treated as physical traces of reality. Light entered a lens, struck a sensor or film, and left behind a measurable imprint of the world. That assumption no longer holds. Today, images that look perfectly photographic can be produced without a camera, a lens, or a physical scene. As generative models advance, the scientific challenge is no longer how to create artificial images, but how to detect them.

Across computer vision, digital forensics, mathematics and media research, a global effort is underway to answer one question: what still separates a real photograph from a synthetic one?

Why detection is even possible

Despite their realism, AI-generated images are built by fundamentally different processes than photographs. A real photo is constrained by physics: optics, sensor noise, lens distortion, demosaicing, compression and lighting geometry. A generated image is the result of a neural network sampling from probability distributions learned during training.

That difference leaves statistical fingerprints. Researchers consistently show that AI images differ from real photographs in:

  • Noise structure
  • High-frequency texture behavior
  • Edge and gradient statistics
  • Compression and resampling artifacts
  • Correlation patterns across pixels

These differences are often invisible to the human eye but measurable with signal processing and statistical learning techniques.

A simple mathematical doorway into detection

One of the most interpretable foundations of image forensics comes from classical image processing rather than deep learning.

  1. Luminance extraction

Most detection methods begin by isolating luminance, the perceived brightness of each pixel. This is done using the international Rec.709 standard:

L(x, y) = 0.2126R + 0.7152G + 0.0722B

This isolates structural light information from color, which is essential for physical analysis.

  1. Spatial gradients

Once luminance is extracted, spatial derivatives are computed:

Gx = ∂L / ∂x
Gy = ∂L / ∂y

These gradients describe how brightness changes across the image and are what edge detectors are built on. In real images, gradients align with physical surfaces, shadows, and materials. In generated images, gradients often expose subtle denoising patterns created during synthesis.

  1. Statistical structure through PCA

Each pixel now corresponds to a 2D vector (Gx, Gy). These vectors are stacked into a matrix, and a covariance matrix is computed:

C = (1/N) · MᵀM

Principal Component Analysis (PCA) decomposes this covariance into dominant directions and variances. Across multiple studies, real photographs and AI images show different variance distributions and eigenvalue ratios. This difference alone can often produce measurable separation without any neural network.

This explains why basic statistical tools still play a role in modern forensic AI detection.

Deep learning detectors and why they struggle

While classical methods are useful, most modern detection systems rely on neural networks trained to distinguish real from fake.

Two of the most influential datasets in the field are:

  • FaceForensics++ (University of Würzburg and Technical University of Munich)
  • DeepFake Detection Challenge (DFDC) by Meta and the Partnership on AI

These datasets enabled large-scale benchmarking of video and image manipulation detection. Early models achieved high accuracy on known types of fakes. However, a critical weakness emerged: generalization.

When new generators appear, many detectors trained on older models fail. This effect has been repeatedly demonstrated in peer-reviewed studies on diffusion models. In short, detectors often learn the behavior of specific generators rather than universal properties of realism.

This has shifted current research toward:

  • Generator-agnostic statistical features
  • Frequency-domain analysis
  • Hybrid systems combining physics-based signals with neural classifiers

Why video detection is both harder and easier than images

Video adds dimensions that help and hurt detection at the same time.

Harder because:

  • Frame quality is often lower
  • Compression artifacts hide forensic traces
  • Motion blur obscures texture statistics

Easier because:

  • Real video obeys rigid temporal consistency laws
  • Noise evolves smoothly across frames
  • Lighting, reflections, and motion follow physical constraints

Synthetic video often fails at micro-temporal continuity. Detectors analyze:

  • Optical flow consistency
  • Temporal gradient coherence
  • Frame-to-frame frequency drift

Modern video detectors use 3D convolutional networks and spatiotemporal transformers to model motion realism at scale.

The shift toward cryptographic provenance

Detection alone may never be fully reliable. This is why a second strategy has become central: content provenance.

The Coalition for Content Provenance and Authenticity (C2PA) was formed by Adobe, Microsoft, Intel, Sony, Arm and others to build an open cryptographic standard for verifying where media comes from. Instead of guessing whether an image is real, content credentials attach:

  • Capture device information
  • Editing history
  • Digital signatures
  • Publishing source

If adopted at scale, provenance shifts the burden from detection to verification and can function even when detectors fail.

Who is doing the work

This field is driven by:

  • Academic computer vision labs in Europe, the US and Asia
  • Large-scale industry research groups (Meta, Google, Microsoft, OpenAI)
  • Forensic institutes supporting legal investigations
  • Policy and standards organizations shaping verification infrastructure

Public benchmarks and open challenges remain the backbone of scientific progress in this area.

The limits of detection

No detection method is permanent. Every detector creates pressure for generators to adapt. This feedback loop is now a defining feature of synthetic media.

Current limits include:

  • Weak generalization across new model families
  • Adversarial post-processing to conceal artifacts
  • High compression destroying forensic signals
  • Legal and ethical limits on biometric fingerprinting

Detection cannot be treated as a finished solution. It is an evolving system inside a technological arms race.

Why this matters for society

The ability to forge visual reality at scale affects:

  • Journalism and evidence
  • Legal proceedings
  • Political trust
  • Scientific integrity
  • Personal identity
  • Historical record

Detection and provenance will shape whether visual media remains a reliable source of truth or becomes a fully fluid synthetic layer of communication.

What to expect next

The next phase of this field will be defined by:

  • Hybrid detectors combining physics, statistics and learning
  • Mandatory provenance standards for professional media
  • Platform-level authentication requirements
  • Regulatory frameworks tied to national election security
  • Real-time video verification systems

The goal is no longer simply to chase fakes, but to rebuild trust in visual information itself.

Sources

  • Rössler et al., “FaceForensics++: Learning to Detect Manipulated Facial Images,” ICCV 2019
  • Dolhansky et al., “The DeepFake Detection Challenge (DFDC) Dataset,” arXiv 2020
  • Corvi et al., “On the Detection of Synthetic Images Generated by Diffusion Models,” arXiv
  • ITU-R Recommendation BT.709, Luminance and Color Encoding Standards
  • Coalition for Content Provenance and Authenticity (C2PA), Technical Specification v2
  • Verdoliva, “Media Forensics and DeepFakes,” IEEE Journal on Selected Topics in Signal Processing
  • Frank et al., “Leveraging Frequency Analysis for DeepFake Image Recognition,” NeurIPS Workshop

written by
Sami Haraketi
Content Manager at BGI