How I Made My SIMD Code 1700x Faster Without Writing Any Intrinsics
Background This is the story of a project to add an efficient CPU backend for a shader language that is meant to compile down to GPU programs. Allow me to set the scene with a few prerequisite pieces of technology to become acquainted with: The Forge Framework The Forge is