A technical blog about Phi Silica – Microsoft’s on-device language model.

The blog covers PTQ, GQA, KV caches, and strategies to increase context length under the constraints of NPU resources (since other models, in addition to Phi Silica, also need to fit on the chip).

I also managed to contribute, albeit a little: Windows Blogs.

Updated: