Chen Jin
Chen Jin
Home
Featured Research
Research Outputs
Blog
Contact
Light
Dark
Automatic
Vision-Language
Diffusion Instruction Tuning
Created Lavender, an SFT method aligning VLM text-vision attention with Stable Diffusion, boosting Llama-3.2-11B and MiniCPM-v2.5 by up to 30% on 20 tasks.
Chen Jin
,
Ryutaro Tanno
,
Amrutha Saseendran
,
Tom Diethe
,
Philip Teare
Segment Anyword
Training-free prompt learning for language-grounded segmentation using token-level cross-attention from a frozen diffusion model to generate object masks.
Zhihua Liu
,
Amrutha Saseendran
,
Lei Tong
,
Xilin He
,
Fariba Yousefi
,
Nikolay Burlutskiy
,
Dino Oglic
,
Tom Diethe
,
Philip Teare
,
Huiyu Zhou
,
Chen Jin
Cite
×