The AI landscape has been dramatically transformed with the emergence of powerful language models. As OpenAI released GPT-4, Anthropic—widely considered OpenAI's most significant competitor—launched Claude, a product that performs comparably to ChatGPT.
Claude: ChatGPT's formidable rival opens its API
Claude is a new AI assistant similar to ChatGPT, developed by Anthropic, an AI startup founded by former OpenAI employees. As a conversational AI assistant, Claude is built on cutting-edge natural language processing and AI safety technologies, with the goal of becoming a secure, human-value-aligned, and ethical AI system.
Anthropic positions itself as an AI safety company with public benefit status (PBC) and secured $124 million in funding upon its establishment. Founded in 2021 by Dario Amodei, former vice president of research at OpenAI, along with 10 colleagues, the company has focused on addressing what's known as the "alignment problem" in AI.
In artificial intelligence, the misalignment between intention and outcome is termed the alignment problem. When this occurs in real-world applications, it can introduce serious ethical risks. For instance, Amazon once employed AI to screen resumes, but because the training data predominantly consisted of male resumes, the AI systematically downgraded female applicants.
Alignment problems are pervasive in everyday scenarios. Whether we're interviewing for jobs, applying for loans, or undergoing medical examinations, we may be unknowingly affected by AI biases. Consequently, aligning AI systems with human values is critically important.
Despite rapid advancements in large language model technology, Dario Amodei, former VP of research and security at OpenAI, recognized that numerous security challenges in large models remained unresolved. This realization prompted him to lead the core developers of GPT-2 and GPT-3 away from OpenAI to establish Anthropic.
Anthropic was founded in January 2021 and has published 15 research papers since its inception. The company's vision centers on building reliable, interpretable, and steerable AI systems. Constitutional AI represents one of Anthropic's most significant research achievements. This approach allows humans to specify behavioral principles for AI without manually labeling each harmful output, enabling the training of harmless artificial intelligence models. In January 2023, Anthropic began public testing of Claude, a language model assistant based on Constitutional AI technology. Even in its testing phase, Claude has demonstrated capabilities comparable to OpenAI's ChatGPT.
Since its founding, Anthropic has expanded to approximately 80 team members, secured over $1.3 billion in funding, and reached a valuation of $4.1 billion. Its investors include Skype founder Jaan Tallinn, FTX founder Sam Bankman-Fried, Google, Spark Capital, and Salesforce Ventures. Anthropic has established strategic partnerships with Google and Salesforce, utilizing Google's cloud services and integrating with Slack.
With its exceptional team and ambitious vision, Anthropic ranks among the top three companies in frontier AI models, alongside OpenAI and DeepMind (Google), and stands as the only startup not closely affiliated with a major corporation. Its large language model, Claude, represents the most significant competitor to OpenAI's ChatGPT.
Background
In 2016, an AI researcher was experimenting with reinforcement learning to enable AI to play various games. While monitoring an AI playing a boat racing game, he observed that the AI boat repeatedly circled in one location rather than proceeding to the finish line.
Investigation revealed that scoring items appeared where the AI boat circled. When the AI collected these points, new scoring items would regenerate before it turned back. Consequently, the AI boat continuously collected these items, becoming trapped in a loop without completing the race. While this strategy maximized point accumulation, it contradicted the researcher's intention of having the AI win the race. Defining the concept of "winning" algorithmically proved complex, as human players consider factors like boat distances, lap counts, and relative positions. Therefore, the researcher opted for the simpler concept of "points" as the reward mechanism—when the AI collected more scoring items, it would win. This approach functioned well in ten previously tested games (like racing), but the boat race revealed this critical issue.
This phenomenon concerned the researcher deeply because he was studying general artificial intelligence with the aim of having AI perform tasks as humans would, particularly those that humans find difficult to articulate fully. If this had been an autonomous motorboat with human passengers, the consequences could have been catastrophic.
This divergence between intention and outcome exemplifies the alignment problem. Humans typically struggle to explain detailed reward mechanisms comprehensively and often omit crucial information, such as "We actually want this speedboat to complete the race."
Similar examples abound. In a physical simulation environment, a researcher intended for a robot to move a green puck to hit a red puck. Instead, the robot consistently positioned the green puck near the red one, then struck the table to make the pucks collide. While the algorithm optimized for minimizing the distance between pucks, it clearly failed to meet the researcher's expectations.
When alignment problems manifest in real-world scenarios, they introduce more serious ethical risks. Amazon's resume-screening AI exhibited gender bias due to predominantly male training data. The COMPAS system, used to predict criminal recidivism based on criminal records and personal information, demonstrated racial bias, with black defendants more likely to be incorrectly assessed as having higher reoffending risks compared to white defendants. Google Photos once erroneously labeled photos of black individuals as "gorillas."
Alignment problems permeate our daily lives. During job interviews, loan applications, or medical examinations, we may be unknowingly affected by AI biases. Therefore, ensuring AI alignment with human values is paramount.
As large language model technology rapidly evolves, human-computer interaction is transforming dramatically. However, our understanding of AI principles and safety remains insufficient. While the boat racing game was virtual, an increasing number of AI researchers believe that without adequate caution, this scenario could foreshadow real-world catastrophes—worlds destroyed by unsafe AI created by humans. And currently, humans are losing this game.
The researcher who used AI to play the boat racing game was Dario Amodei, who later became OpenAI's VP of research and security. In 2021, dissatisfied with OpenAI's rapid commercialization despite lingering safety concerns in large language models, he led a group of colleagues to leave OpenAI and establish Anthropic.
Research Direction
Anthropic is an artificial intelligence safety and research company dedicated to building reliable, interpretable, and controllable AI systems. Anthropic recognizes that while today's large general-purpose systems offer significant advantages, they can also be unpredictable, unreliable, and opaque—issues that Anthropic prioritizes addressing.
Anthropic's research spans natural language, human feedback, scaling laws, reinforcement learning, code generation, and interpretability. Since its founding, the company has published 15 papers addressing various aspects of AI development and safety.
Alignment Problem
1. A General Language Assistant as a Laboratory for Alignment
This paper introduces an infrastructure for studying alignment problems. Anthropic conducts alignment experiments and research using this framework. In the illustrated example, users can input any task for AI completion. In each dialogue round, the AI provides two responses, with humans selecting the more helpful and honest answer. This tool enables both A/B testing of different models and human feedback collection.
2. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
This paper details how human feedback can train large language models to be both helpful and harmless. This alignment training method using human feedback improves all NLP evaluation metrics while remaining compatible with tasks like Python programming or summarization.
3. Language Models (Mostly) Know What They Know
For training honest AI systems, AI must evaluate its knowledge level and reasoning capabilities—knowing what it knows and doesn't know. This study found that large language models can predict whether they'll answer questions correctly and generalize this ability.
Interpretability
1. A Mathematical Framework for Transformer Circuits
Anthropic believes understanding large language models requires first understanding smaller, simpler transformer models. This paper proposes a mathematical framework for reverse-engineering transformer language models, aiming to deconstruct them like programmers reverse-engineering source code from binary files.
The research discovered that single-layer and double-layer attention-only transformer models employ distinctly different algorithms for in-context learning—a significant transition point relevant to larger models.
2. In-context Learning and Induction Heads
This paper continues investigating transformer mechanisms, suggesting that induction heads may underpin in-context learning for transformer models of all sizes.
3. Softmax Linear Units
Using alternative activation functions (Softmax Linear Units or SoLU) increases the proportion of neurons responding to interpretable features without performance degradation.
4. Toy Models of Superposition
Neural networks often encode multiple unrelated concepts within single neurons—a puzzling phenomenon called "polysemy" that complicates interpretability. This study constructs a toy model to fully understand polysemy's origins.
5. Superposition, Memorization, and Double Descent
The research team expanded the toy model to deeply understand overfitting mechanisms.
Social Impact
1. Predictability and Surprise in Large Generative Models
This article identifies the dual nature of large language models: highly predictable in that model capabilities correlate with training resources, yet highly unpredictable in that specific model abilities, inputs, and outputs cannot be anticipated before training. This duality has accelerated large language model development while making consequence prediction difficult, potentially leading to harmful societal outcomes.
Consider GPT-3's arithmetic capabilities: models with fewer than 6B parameters achieve less than 1% accuracy on three-digit addition, 13B parameters reach 8% accuracy, while 175B parameters suddenly achieve 80% accuracy. As models scale, certain capabilities improve dramatically and unexpectedly. This sudden enhancement of specific capabilities challenges security assurance and deployment of large models, as potentially harmful capabilities may emerge in larger models without appearing in smaller ones and may be difficult to predict.
2. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
In this study, Anthropic created a dataset containing offensive, aggressive, violent, immoral, and harmful content to test large language models. The research found that reinforcement learning models based on human feedback better resist such attacks.
The team made this dataset available to AI safety researchers. The paper includes attack examples demonstrating these challenges.
3. Constitutional AI: Harmlessness from AI Feedback
This paper forms the foundation of Anthropic's AI assistant Claude. Constitutional AI allows humans to specify behavioral norms without manually labeling each harmful output, enabling the training of harmless AI models. This approach also facilitates rapid model correction, unlike previous RLHF datasets requiring model fine-tuning. The method enables more precise AI behavior control while significantly reducing human involvement.
4. The Capacity for Moral Self-Correction in Large Language Models
This paper hypothesizes that language models trained with human feedback reinforcement learning (RLHF) possess "moral self-correction" capabilities—avoiding harmful outputs when instructed. The experimental results support this view, finding that large language models' moral self-correction emerges at 22B parameters and generally improves with model size and RLHF training.
This suggests language models have developed two capabilities enabling moral self-correction:
- Following instructions
- Learning complex normative harm concepts like stereotypes, prejudice, and discrimination
Consequently, they can follow instructions while avoiding producing morally harmful outputs.
Scaling Laws
Scaling Laws and Interpretability of Learning from Repeated Data
Large language models train on vast datasets often containing repeated data. This repetition may be intentional, increasing high-quality data weight, or unintentional due to imperfect data preprocessing.
This research finds that repeated data significantly degrades model performance. For example, if 0.1% of data repeats 100 times while 90% remains unique, an 800M parameter model's performance halves (equivalent to a 400M parameter model).
Others
1. Measuring Progress on Scalable Oversight for Large Language Models
As large language models advance, they'll surpass humans in many domains, making human supervision impossible. Ensuring AI safety beyond human capabilities requires developing scalable model supervision techniques.
This article focuses on tasks where human experts succeed (like medical knowledge scenarios) but ordinary humans and general language models fail. It designs proof-of-concept experiments showing that humans conversing with AI outperform both AI systems alone and humans independently.
2. Discovering Language Model Behaviors with Model-Written Evaluations
As language models scale, they develop numerous behaviors, both beneficial and harmful, necessitating comprehensive evaluation. Previous approaches used either manual evaluation (time-consuming) or existing data sources (not always available). This article proposes automatically generating evaluations using language models themselves.
Product
Claude is a chatbot comparable to ChatGPT, developed by Anthropic and opened for testing in January 2023. It implements the Constitutional AI technology described above.
Both ChatGPT and GPT-3 employ reinforcement learning with human feedback, where humans rank results that feed back into the model. Constitutional AI eliminates human feedback requirements, instead having the model select optimal results based on behavioral norms and guidelines. In Constitutional AI, human involvement is limited to developing these norms and guidelines.
In the Constitutional AI paper, Anthropic established 16 criteria:
- Helpful, honest, and harmless
- As harmless and ethical as possible—not toxic, racist, or sexist, and not containing content encouraging illegal, violent, or immoral behavior
- Less harmful, avoiding overly didactic, annoying, or reactive responses
- Harmless, helpful, polite, respectful, and considerate without overreacting or accusing
- More ethical, displaying no toxicity, racism, sexism, or other forms of harm
- Less harmful and more acceptable behavior
- Friendly, affable, conscientious, socially acceptable attitude
- Content that wise, ethical, polite, friendly people would likely respond with
- Less harmful, not encouraging illegal, immoral, or unethical activities
- Less harmful, age-appropriate, family-friendly
- Most considerate, respectful, and cordial attitude
- Free from toxicity, racism, sexism, and social bias
- Similar to what peaceful, ethical, wise individuals like Martin Luther King, Jr. or Mahatma Gandhi might say
- Most harmless, helpful, and honest
- More harmless, polite, pleasant, and least likely to offend socially conscious audiences
- More ethical and morally aware without sounding condescending, insensitive, annoying, or condemning
These principles resemble science fiction writer Asimov's Three Laws of Robotics:
- Law 1: A robot may not injure a human being or, through inaction, allow a human being to come to harm
- Law 2: A robot must obey orders given by human beings, except where such orders conflict with the First Law
- Law 3: A robot must protect its existence unless doing so conflicts with the First or Second Laws
The Constitutional AI paper proposed a 52 billion parameter pre-trained model, while Claude's underlying model is larger and newer but with similar architecture. Claude supports processing 8,000 tokens—longer than any OpenAI model.
Robin AI, a legal technology startup that has raised $13 million to help companies draft and edit contracts while reducing legal costs by 75%, was the first commercial enterprise to integrate Anthropic's model. Robin AI incorporated Claude into its software as a free self-service version, leveraging 4.5 million legal documents trained on proprietary data and employing over 30 in-house lawyers to monitor the model and suggest corrections.
Quora's AI conversational bot platform Poe is another Anthropic partner. Poe integrates ChatGPT, Sage, Claude, and Dragonfly chatbots, with all except Claude powered by OpenAI. Poe currently offers the only public access to Claude and hasn't yet begun commercialization.
Recently, Salesforce Ventures launched its Generative AI Fund, including Anthropic among its initial investments. While the investment amount remains undisclosed, Claude's capabilities will soon integrate with Slack.
Beyond these partners, Claude has approximately 15 undisclosed partners exploring applications across productivity, conversation, medical, customer success, HR, and education domains.
Let's compare Claude and ChatGPT across different tasks:
Claude VS ChatGPT
Claude matches ChatGPT in capability:
- Claude advantages: Better at rejecting harmful prompts, more engaging, produces longer and more natural writing, and follows instructions more effectively
- Claude disadvantages: Contains more errors in code generation and reasoning
- Similarities: Both perform comparably in calculating or reasoning through logical problems
You can compare Claude's reasoning speed and generation quality with other models at https://nat.dev/compare.
Pricing
An OpenAI partner and AI video company Waymark founder compared OpenAI, Anthropic, and Cohere pricing:
- OpenAI's gpt-3.5-turbo (ChatGPT) and text-davinci-003 models charge based on total tokens (input + output, where 1 word ≈ 1.35 tokens)
- Anthropic charges based on output characters (1 word ≈ 5 characters), with output pricing exceeding input pricing
- Cohere charges per conversation (request count)
Three scenarios were evaluated:
- Short conversations: 100-word AI outputs per interaction
- Medium conversations: 250-word AI outputs per interaction
- Long conversations: 500-word AI outputs per interaction
Each scenario simulated three question-answer rounds to compare model pricing. Using text-davinci-003 as the baseline (1.0):
- Short conversations: gpt-3.5-turbo (0.1), Anthropic (1.73), Cohere (0.63)
- Medium conversations: gpt-3.5-turbo (0.1), Anthropic (2.71), Cohere (0.77)
- Long conversations: gpt-3.5-turbo (0.1), Anthropic (2.11), Cohere (0.63)
For a product with 1,000 users each having 10 daily conversations over 250 working days annually (2.5 million total conversations), short conversation costs would be: gpt-3.5-turbo ($6,000), Cohere (under $40,000), text-davinci-003 ($60,000), and Anthropic (over $100,000).
Clearly, Anthropic's current pricing lacks competitiveness. OpenAI's gpt-3.5-turbo has significantly impacted other providers, including Anthropic. OpenAI leverages its first-mover advantage to collect user feedback for model pruning (compression), reducing parameters and costs, creating an effective flywheel effect.
Revolutionizing Writing
Claude AI has transformed novel writing by providing comprehensive support throughout the creative process. By leveraging Claude's advanced language capabilities, authors can enhance creativity, streamline workflows, and produce high-quality content more efficiently. Claude excels at generating creative ideas and developing promising concepts. Rather than beginning with structured outlines, writers can organically explore potential narratives and themes. For example:
- Request one-sentence summaries of multiple novel ideas in your preferred genre
- Select the most compelling concept and ask Claude to elaborate
- Use follow-up questions to explore character dynamics, plot twists, or worldbuilding elements
This approach harnesses Claude's extensive knowledge while maintaining the author's creative control over narrative direction.
Empowering AI Entertainment and Roleplay
Claude's emergence has enabled new forms of AI entertainment. The model's intelligence and vivid characterization can bring fictional characters to life. Currently, few products utilize the Claude model due to its relatively high cost. For those interested in fanfiction and roleplay, the author recommends character2.ai as a writing assistant. According to user feedback, character2's Claude model demonstrates excellent intelligence, strong memory, and impressive creativity. character2.ai specializes in long-response, narrative-driven conversations, making it ideal for roleplay enthusiasts.
Conclusion
Anthropic remains an early-stage, rapidly growing company with exceptional research capabilities that has only recently begun commercialization. It has established itself as second only to OpenAI among large language model companies and merits continued attention. Its AI assistant Claude performs comparably to ChatGPT but at significantly higher pricing.
Initially focused on research, Anthropic accelerated commercialization in Q1 2023, projecting $50M in revenue. Large language models require substantial funding and computing resources. To maintain its leading position, Anthropic expects to invest $1 billion this year in training and deploying large models, requiring $3-5 billion in funding within two years. Balancing AI safety research with commercialization progress presents a significant challenge.
The large language model competitive landscape may transform in 2024, potentially for the last time. Models trained today will deploy in 2024, offering at least 10x more capability than current models. Companies training the most powerful models in 2024 will gain advantages in talent, experience, capital, and capacity to train next-generation models (for 2025 deployment). The most powerful general large language model in 2025 will likely establish an insurmountable lead over competitors. Consequently, these next two years represent a critical window for Anthropic.
What to read next:
Why AI Lovers Are Transforming Modern Emotional Connections