From bf16 to int8 subword parallelism
Great visualizations. Particularly loved the first, showing matrix multiplication.
Oh, this is wonderful. I had figured out the v4 approach myself in the past, but didn't understand the missing piece for how v5 obtained 2x throughput. The gate cost estimate is instructive.
I'm glad you liked it!
Great visualizations. Particularly loved the first, showing matrix multiplication.
Oh, this is wonderful. I had figured out the v4 approach myself in the past, but didn't understand the missing piece for how v5 obtained 2x throughput. The gate cost estimate is instructive.
I'm glad you liked it!