With the growth of Adobe Firefly, I have been busy working on multiple related technologies. One key ingredient for text-to-image applications is controllability, and we have recently announced Structure Reference as another form of user control. Simply put, it estimates the depth of a reference image and then performs the text-to-image synthesis conditioned on this depth estimate.
I provided the underlying depth estimator that enables this technology, which is the outcome of a multi-year effort that started with our 3D Ken Burns paper that I have been continuously improving since then. On this note, depth estimation is tricky to get right and pretty depth maps, like the once one might find popping up on social media every once in a while, don't necessarily work well for downstream applications.