TryOnDiffusion Architecture

TryOnDiffusion Architecture

Detailed explanation of the dual UNet architecture.

Dual UNet Structure

The model consists of two parallel UNets:

Person UNet: Generates final output
Garment UNet: Processes garment features

Key Components

Cross-attention mechanisms
Self-attention with pose conditioning
FiLM layers for feature modulation
Attention pooling for pose embeddings

See TryOnDiffusion README for complete architecture details.