2025 Devlog
muladd
2025-02-22Quick performance tip. If you’re comfortable requiring support for FMA3 (Intel & AMD CPUs have had it for the last 13 years!) you should probably be using your language’s fused multiply add intrinsic where relevant.
For example, in Zig you’d replace a * b + c
with @mulAdd(f32, a, b, c)
. This is much faster, and more precise.
This is an easy optimization, but the compiler can’t do it for you because replacing your arithmetic with more precise arithmetic isn’t standards compliant.
(Unless you’re using –ffast-math, then all bets are off WRT precision and you can just let the compiler do its thing.)
The only catch is, make sure you’re actually setting your target correctly! The compiler can’t ignore your request for muladd since that would change the precision of you result, so if you’re targeting a baseline CPU that doesn’t have it, you’ll end up emulating it in software and you’ll go slower instead of faster.