RefEdit

A Benchmark and Method for Improving Instruction-based Image Editing Model for Referring Expression

Bimsara Pathiraja*

Arizona State University

arXiv Code Demo Model Data

Figure 1: RefEdit is a referring expression-based image editing benchmark and a finetuned model. Our proposed RefEdit model can accurately identify the entity of interest and perform accurate edits.

Abstract

Despite recent advances in inversion and instruction-based image editing, existing approaches primarily excel at editing single, prominent objects but significantly struggle when applied to complex scenes containing multiple entities. To quantify this gap, we first introduce RefEdit-Bench, a rigorous real-world benchmark rooted in RefCOCO, where even baselines trained on millions of samples perform poorly. To overcome this limitation, we introduce RefEdit — an instruction-based editing model trained on our scalable synthetic data generation pipeline. Our RefEdit, trained on only 20,000 editing triplets, outperforms the Flux/SD3 model-based baselines trained on millions of data. Extensive evaluations across various benchmarks demonstrate that our model not only excels in referring expression tasks but also enhances performance on traditional benchmarks, achieving state-of-the-art results comparable to closed-source methods. We will release our code, data, and checkpoints.

RefEdit-Bench

Figure 2: Three samples from each of Easy and Hard categories and the overlayed mask of the interested object are shown.

Results

Figure 3: Qualitative results on image editing. The top 3 samples are from the Easy category and the bottom 3 samples are from the Hard category. As illustrated, our method attains the SOTA performance on comparison of all the methods.

BibTeX

@article{pathiraja2025refedit,
    title={RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model for Referring Expression},
    author={Pathiraja, Bimsara and Patel, Maitreya and Singh, Shivam and Yang, Yezhou and Baral, Chitta},
    journal={arXiv preprint arXiv:2506.03448},
    year={2025}
}