Evaluating the Evolution of AI: ChatGPT’s Accuracy in Supreme Court Queries
on Mar 21, 2025
at 3:15 pm

In 2023, ChatGPT incorrectly stated that Ginsburg dissented in Obergefell; this error has since been corrected.
In March 2023, SCOTUSblog launched an evaluation of ChatGPT to assess its accuracy in responding to questions related to the Supreme Court. The findings revealed a mixed performance where the AI answered only 21 out of 50 questions correctly. Two years later, advancements in AI technology prompted a reevaluation to check for improvements.
Refined Knowledge and Enhanced Responses
ChatGPT has demonstrated an increase in factual accuracy since its previous assessment. Among its successes, the AI accurately recognized that the Supreme Court was originally comprised of six justices and improved its explanation of complex legal terms like “relisted” petitions. Furthermore, it now attributes the concept of the counter-majoritarian difficulty to Professor Alexander Bickel with greater clarity.
Improvements Noted:
- Correct identification of President Donald Trump’s judicial appointments during his first term.
- Accurate recognition of Justice Joseph Story as the youngest appointed justice.
- Enhanced understanding of Youngstown Sheet & Tube Co. v. Sawyer.
- Increased detail in responses about non-justiciability and duties of junior justices.
Additionally, ChatGPT has improved in areas previously fraught with inaccuracies, such as the average number of oral arguments presented yearly and cases dismissed as improvidently granted (DIGs).
Examining AI Models: Variability in Performance
During this study, three new AI models were evaluated: 4o, o3-mini, and o1. Each model exhibited a unique style and approach to answering questions, which affected their performance levels.
4o: The Overeager Communicator
The model 4o often exceeded the basic query, providing more detail than necessary but sometimes leading to inaccuracies. For example, it offered comprehensive discussions on Supreme Court reform proposals and historical legal cases, yet misquoted significant events and figures due to its verbosity.
o3-mini: Speed Over Completeness
This model rapidly delivered responses, prioritizing speed but often at the expense of completeness and accuracy. Instances of incorrect timelines and misinterpretations characterized its submissions.
o1: The Balanced Performer
Achieving the highest accuracy of the three models, o1 managed to combine rapid response times with necessary detail, often offering contextual information that eluded the other models.
Current Trends: AI and Search Engines
The line separating traditional search engines and AI-driven results continues to blur as search queries increasingly incorporate AI functionalities. This integration may impact the way users engage with information retrieval in legal and other contexts.
Conclusion: Progress in AI Understanding of Legal Matters
ChatGPT and similar tools have shown substantial improvements since 2023. While initially returning a 42% correct response rate, the updated models revealed notable advancements with 4o achieving 58%, o3-mini at 72%, and o1 delivering an impressive 90% accuracy. The ongoing evolution of AI reflects its potential to contribute meaningfully to legal comprehension, though it is vital to pair its use with rigorous research and verification.
Future Questions and Scenarios
In an exploratory follow-up, ChatGPT handled queries about significant Supreme Court rulings and provided accurate responses. However, gaps still existed in its analysis of certain recent cases, indicating room for continued development. Moreover, its understanding of abstract inquiries showcased not only its legal knowledge but also a bit of humor, hinting at a broader cognitive engagement with its user interactions.
For more information, visit SCOTUSblog.