Orca: The Open Source Model That Beats GPT-3.5 and Matches GPT-4 in Reasoning Tests
In the world of natural language processing, the development of open source models has been a topic of much debate. Recently, a paper was published claiming that open source models can mimic the style but not the factuality of GPT (Generative Pre-trained Transformer) models. However, a new 51-page report on Orca, based on a small 13 billion parameter model, has challenged this assertion. Orca, developed by Microsoft, not only competes with GPT-3.5 but also beats it in several well-established benchmarks and even matches GPT-4 in a couple of tests of reasoning.
Table of Contents
1. Introduction
2. Orca: The New Open Source Model
3. How Orca Learns
4. Orca's Performance in Benchmarks
5. Improving Orca's Performance
6. The Future of Open Source Models
7. Conclusion
8. Resources
9. FAQ
Introduction
In this article, we will explore the development of Orca, a new open source model that has been making waves in the natural language processing community. We will examine how Orca was developed, how it learns, and how it performs in various benchmarks. We will also discuss the future of open source models and their potential impact on the field of natural language processing.
Orca: The New Open Source Model
Orca is a 13 billion parameter model developed by Microsoft that learns to imitate the reasoning process of larger models like GPT-4. According to the abstract of the paper, other models like llama, alpaca, and vicuna lack rigorous evaluation, resulting in overestimating the small model's capability as they tend to learn to imitate the style but not the reasoning of LFM's (Large Foundation Models). To address these challenges, Microsoft developed Orca, which learns by looking at GPT-4's step-by-step thought processes and is guided by teacher assistance from Chachi PT, which is GPT-3.5.
How Orca Learns
Orca learns by leveraging system instructions, asking models like GPT-4 and Chachi PT to think step by step. This gives Orca access to detailed responses from the model that explain the reasoning process of the teacher as it generates the response. It allowed these parent models of GPT-3.5 and GPT-4 to be much better tutors for this young Orca. Also, they let the teachers of Chachi PT and GPT-4 give far more examples to their student, 5 million and 1 million examples respectively. This compares to the other models like alpaca, wizard, vicuna, etc., which had tens of thousands or the low hundreds of thousands of examples. The key difference is the explanations, the step-by-step thinking that the smaller Orca could then imitate.
Orca's Performance in Benchmarks
Orca outperforms conventional state-of-the-art models such as vicuna by more than 100 in complex zero-shot reasoning benchmarks like the Big Bench Hard. It reaches parity with Chachi PT on the Big Bench Hard and shows competitive performance in professional and academic examinations by the SAT, LSAT, GRE, and GMAT. Orca is 95% of GPT-3's quality and 85% of GPT-4's quality as assessed by GPT-4 for open-ended generation, not multiple choice. Orca massively outperforms the previous best open source model, vicuna, beating even GPT-3 on average in the Big Bench Hard benchmark.
Improving Orca's Performance
The authors of the paper suggest that Orca suggests that learning from step-by-step explanations could significantly improve the quality of models regardless of their size. They hope these insights will inform the design of more robust evaluation methods compared to those used for vicuna, for example, and the advancement of alignment and post-training techniques and the more effective use of powerful models like GPT-4 as teachers. Orca could be improved through tool augmentation, and that's not just calculators, calendars, Bing, or Auto GPT. This paper from last week demonstrated that larger models can create tools that smaller models can then use more efficiently.
The Future of Open Source Models
The development of Orca has challenged the assertion that open source models can only mimic the style but not the factuality of GPT models. The authors of the paper suggest that learning from step-by-step explanations could significantly improve the quality of models regardless of their size. However, there is still a gap between open source and proprietary models, and this gap may even increase over time. The amount of effort and engineering and research that it takes to produce one such neural net keeps increasing, and even if there are open source models, they will be less and less produced by small groups of dedicated researchers and engineers.
Conclusion
Orca is a new open source model that has challenged the assertion that open source models can only mimic the style but not the factuality of GPT models. Orca not only competes with GPT-3.5 but also beats it in several well-established benchmarks and even matches GPT-4 in a couple of tests of reasoning. The development of Orca suggests that learning from step-by-step explanations could significantly improve the quality of models regardless of their size. However, there is still a gap between open source and proprietary models, and this gap may even increase over time.
Resources
- Orca Paper: https://arxiv.org/abs/2110.05247
- Big Bench Hard: https://github.com/google-research/big-bench
- Let's Verify Step-by-Step: https://arxiv.org/abs/2109.03313
- Unfaithful Reasoning: https://arxiv.org/abs/2109.04939
- Flan Collection: https://github.com/google-research/language/tree/master/language/orqa/data/flan
FAQ
1. What is Orca?
Orca is a new open source model developed by Microsoft that learns to imitate the reasoning process of larger models like GPT-4.
2. How does Orca learn?
Orca learns by leveraging system instructions, asking models like GPT-4 and Chachi PT to think step by step. This gives Orca access to detailed responses from the model that explain the reasoning process of the teacher as it generates the response.
3. How does Orca perform in benchmarks?
Orca outperforms conventional state-of-the-art models such as vicuna by more than 100 in complex zero-shot reasoning benchmarks like the Big Bench Hard. It reaches parity with Chachi PT on the Big Bench Hard and shows competitive performance in professional and academic examinations by the SAT, LSAT, GRE, and GMAT.
4. How can Orca be improved?
Orca could be improved through tool augmentation, and that's not just calculators, calendars, Bing, or Auto GPT. This paper from last week demonstrated that larger models can create tools that smaller models can then use more efficiently.
5. What is the future of open source models?
The development of Orca has challenged the assertion that open source models can only mimic the style but not the factuality of GPT models. However, there is still a gap between open source and proprietary models, and this gap may even increase over time.