与GPT-3一样,DALL·E是一个转换器语言模型。它接收文本和图像作为包含多达1280个令牌的单个数据流,并使用最大似然进行训练,以一个接一个地生成所有令牌。 [^脚注1]

该训练程序允许DALL·E不仅从头开始生成图像,还可以以与文本提示一致的方式重新生成延伸到右下角的现有图像的任何矩形区域。


We recognize that work involving generative models has the potential for significant, broad societal impacts. In the future, we plan to analyze how models like DALL·E relate to societal issues like economic impact on certain work processes and professions, the potential for bias in the model outputs, and the longer term ethical challenges implied by this technology.