Implements the DiffEdit paper using stable diffusion. The key insight: you can locate what needs to change in an image by comparing the denoising behavior under two different text prompts — no manual masking required.

Key ideas covered:
- Mask generation: contrasting noise predictions from a source and target prompt to identify the edit region
- DDIM inversion: encoding the input image back into latent noise to preserve unedited regions
- Targeted denoising: applying the diffusion process only within the generated mask
- Stable Diffusion internals: CLIP text encoding, UNet denoising, VAE decoding