Anthropic, an artificial intelligence (AI) and “public benefit” company, launched Claude 2 on July 11, marking another milestone in a year full of seemingly nonstop progress from the burgeoning generative AI sector.
Introducing Claude 2! Our latest model has improved performance in coding, math and reasoning. It can produce longer responses, and is available in a new public-facing beta website at https://t.co/uLbS2JNczH in the US and UK. pic.twitter.com/jSkvbXnqLd
According to a company blog post, Claude 2 shows improvements across nearly every measurable category. Perhaps most noteworthy among the differences between it and its predecessor is how the researchers discuss their work.
There’s no mention of traditional machine learning benchmarking or computational scores against similar models in the blog post announcing Claude 2. Instead, Anthropic tested both Claude and Claude 2 head-to-head on numerous tests meant to represent real-world knowledge, skills and problem-solving tests.
Claude 2 beat its predecessor across the board on knowledge, coding and other exams and, according to Anthropic, even scores well against human averages:
It is worth noting that many experts believe comparisons between human and AI test takers are inefficacious due to the nature of human cognitive reasoning and the likelihood that a large language model’s training data set contains test information. Essentially, tests designed for humans may not actually “test” an AI’s ability to reason or provide a proper demonstration of actual knowledge or skill.
Along with the launch of Claude 2, Anthropic debuted a beta version of a web-based “Talk to Claude” interface providing general access to the chatbot for users in the United States and the United
Read more on cointelegraph.com