TryOnDiffusion Architecture
Detailed explanation of the dual UNet architecture.
Dual UNet Structure
The model consists of two parallel UNets:
- Person UNet: Generates final output
- Garment UNet: Processes garment features
Key Components
- Cross-attention mechanisms
- Self-attention with pose conditioning
- FiLM layers for feature modulation
- Attention pooling for pose embeddings
See TryOnDiffusion README for complete architecture details.