Stable Diffusion Version 2


Stability AI has released Stable Diffusion 2.0, what’s it like?

The new version comes with a lot of features, unfortunately not all of them very friendly towards people using AMD Radeon (like me).

There’s both a model aimed at generating 512×512 and 768×768 but for the latter no ONNX version has been released. I converted it myself, but at 768×768 it just runs out of memory, which is what I expected. In v1.5 everything worked fine until about 512×704 (or 704×512). First tests with the 512×512 v2-base model work up to 512×640, before performance falls of a cliff.

For me performance on v2.0 is worse than v1.5. I managed to get 2.5it/s on v1.5 and am now down to 2it/s on v2.0. I am using ORT DirectML Nightly which roughly doubles performance for me compared to ORT DirectML 1.13.1. In v1.5 I also ran without the safety checker to gain performance, more on that below.

The Safety Checker seems to be a thing of the past. In v1.5 you could just turn it off. As it came with a significant performance hit, that was the best option. In 2.0 they have opted to do the filtering during training. This makes more sense. The Safety Checker would just give you a black image and you’d wonder what exactly it tripped over. They might as well aim to generate less unwanted content to begin with. Of course, the debate then turns to the definition of “unsafe”. Plenty of classic paintings have nudes. While supposedly the scoring also takes into account how “aesthetic” content is, backlash was quick to prop up. I’m sure it’ll take some time before we’ll know whether v2.0 is really incapable of creating all of the wished for content.

There’s a new inpainting model, again without ONNX but likely based on v2.0 base (the 512×512 model). It seems like diffusers also got a pipeline update for inpainting. After conversion to ONNX, the inpainting model works but performance is problematic. With 1.5 the inpainting runs at the same speed as image generation (2.5it/s). In 2.0 the inpainting runs at 1/4th the speed of image generation. I’ll have to check my conversion workflow to see if it is causing the issue. Note that speed comparison is done in the exact same environment, with an updated diffusers release. Output is fine though as you can see below. For the pedants, it’s actually outpainting rather than inpainting. I first generate a 512×512 and then generate an additional 256×512 on the left and right.

Version 2 comes with an AI upscaler capable of giving 4x scale (doubling both width and height). But again, no ONNX model available. This would also require a new ONNX pipeline in diffusers which isn’t there yet, so conversion alone is not going to cut it.

Another extra model provides depth analysis so that you can do img2img while preserving depth. Great concept but again not immediately clear how to get it going in ONNX. There’s no model, and likely a missing pipeline.

The 2.0 release also comes with changes to prompt handling. As such writing prompts for 2.0 and 1.5 is quite different. This will be fairly annoying for anyone who was really well versed in writing prompts for 1.5. On the flip side, maybe once mastered, the new prompts can generate better results?

After a few days of testing and getting my environment right, 2.0 isn’t doing it yet for me. Mostly because I’m trying to get it running with ONNX DirectML. If anyone has a recent CUDA card with plenty of VRAM lying around, feel free to contact me 😉

, ,

Leave a comment