Alternate account for @simple@lemmy.world

  • 18 Posts
  • 4 Comments
Joined 2 years ago
cake
Cake day: July 3rd, 2023

help-circle


  • in developing our reasoning models, we’ve optimized somewhat less for math and computer science competition problems, and instead shifted focus towards real-world tasks that better reflect how businesses actually use LLMs.

    I was just about to say how useless these benchmarks are. Plenty of LLMs claim to be better than Claude and GPT4, but in real world use they’ve always been more reliable. Claude especially. Good to hear they’re not just chasing scores.