Discussion about this post

User's avatar
The AI Architect's avatar

Outstanding deep dive. The Sardana et al. framing around inference-optimized scaling is super clarifying, makes the Llama-3 training choices feel way less arbitrary. Also didnt realize Schaeffer's work basically debunked emergence as just measurement artifacts, thats kinda huge for predicting capabilities.

Expand full comment

No posts

Ready for more?