DALLE2 PyTorch

The modern transformer architecture has been a huge success in a multitudes of fields and has been the backbone of many state of the art models. However, in order to achieve the amazing results we are seeing all over the internet, we need to throw ungodly amounts of gpu time at the problem. The LAION organization in conjunction with along with a team of independent open source developers has made an effort to combat the trend toward only the most powerful having access to large models by replicating and training open source version using donated compute time. This project is a part of that effort.

DALLE2-PyTorch was trained using the StabilityAI HPC for 150k steps.

My Contribution

DALLE2-PyTorch was primarily a collaboration between myself, lucidrains (Phil Wang), and nousr. I was responsible for the decoder training code, distributed training, and configuration management.

Work in progress - Please forgive the dust…