Benchmarks

Everyone posts clean benchmark tables. Here’s what happens when you actually try to reproduce them. I’ve been running a Mac Studio M4 Max (64GB) as a local LLM server for a couple weeks now. Four models, two backends, and way more debugging than expected. This post documents what I found, including the parts that didn’t work. The hardware Mac Studio M4 Max with 64GB unified memory. The unified memory architecture means the GPU and CPU share the same 64GB pool, so large models can actually fit without swapping. ...