How I Made My SIMD Code 1700x Faster Without Writing Any Intrinsics
Background
This is the story of a project to add an efficient CPU backend for a shader language that is meant to compile down to GPU programs. Allow me to set the scene with a few prerequisite pieces of technology to become acquainted with:
The Forge Framework
The Forge is