There’s now an “easy” way to use FP16 ONNX Models in Stable Diffusion: https://github.com/Amblyopius/AMD-Stable-Diffusion-ONNX-FP16
Following on my previous post I did some testing and noticed that just doing UNET was not enough. The Text Encoder model is now in FP16 too. Bringing generation down from 1.5s/it to 1.2s/it on my 6700XT (for 768×768).
I have had various people test and one of the nicer results is that FP16 also brings decent generation speeds for 6GB cards. On 5600XT the generation speed is 4x better than with FP32.