Let's Sample Step by Step:
Adaptive-Consistency for Efficient Reasoning with LLMs

1Department of Computer Science and Engineering, Indian Institute of Technology Delhi 2Language Technologies Institute, School of CS, Carnegie Mellon University

TLDR: a sampling method that matches Self-Consistency in performance with up to 7.9x fewer samples.

Abstract

A popular approach for improving the correctness of output from large language models (LLMs) is Self-Consistency -- poll the LLM multiple times and output the most frequent solution. Existing Self-Consistency techniques always draw a constant number of samples per question, where a better approach will be to non-uniformly distribute the available budget, based on the amount of agreement in the samples drawn so far.

In response, we introduce Adaptive-Consistency, a cost-efficient, model-agnostic technique that dynamically adjusts the number of samples per question using a lightweight stopping criterion. Our experiments over 13 datasets and two LLMs demonstrate that Adaptive-Consistency reduces sample budget by up to 7.9 times with an average accuracy drop of less than 0.1%.

Key-Highlights

  • 🚀 Dynamic Sampling: Our method smartly adjusts the number of samples per question using a lightweight stopping criterion.
  • 📊Cost-Effective Performance: Adaptive-Consistency reduces the sample budget by up to 6.0 times with an average accuracy drop of less than 0.1%.
  • 🔌Off-the-Shelf Solution: Improve accuracy or reduce cost right away with just 2-3 lines of code changes. No additional training required.
  • 🌐Compatible with Pre-trained LLMs: Our method works seamlessly with popular large language models like GPT-3.

Results Summary

Adaptive-Consistency outperforms Self-Consistency on 13 benchmarks, over varying domain, difficulty, answer type.

Mathematical Reasoning

Logical Reasoning


Other Reasoning Datasets

Using Adaptive-Consistency in your code

Using Adaptive-Consistency in your code takes only 2-3 line changes.

1. Modifying Self-Consistency

from adaptive_consistency import AC

outputs = []
ac = AC(stop_criteria = 'beta')
for input in dataset:
    output = openai.Completion.Create(input, *args, **kwargs)
    outputs.append(output)
    if ac.should_stop(outputs):
        break
                

2. Modifying Vanilla Prompting

from adaptive_consistency import AC

output = sampling_function(*args, **kwargs)
output = AC(stop_criteria = 'beta').eval_loop(sampling_function, *args, **kwargs)
            

BibTeX

@misc{aggarwal2023lets,
      title={Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs}, 
      author={Pranjal Aggarwal and Aman Madaan and Yiming Yang and Mausam},
      year={2023},
      eprint={2305.11860},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}