Comparing the “reasoning” capabilites of multiple LLM models
The Puzzle
Solve this puzzle: X and Y are two whole numbers greater than 1, and Y > X. Their sum is not greater than 100. Suresh and Prakash are two perfect logicians; Suresh knows the sum X + Y and Prakash knows the product X ร Y. Both Suresh and Prakash know all the information in this paragraph.
In the following conversation, both participants are always telling the truth:
Suresh says “Prakash does not know X and Y.” Prakash says “Now I know X and Y.” Suresh says “Now I also know X and Y.” What are X and Y?
Reasoning Models
Curious about solving this puzzle? The buzz in the world of GenAI this week is about the o1 model.
o1 – the first reasoning model from OpenAI came out last week. The initial verdict is mixed – with some people saying that it is nothing more than COT (Chain of Thoughts – making a model deliberately think through steps) while others claim this is a breakthrough.
Some of what we are doing with o1 is with private data and evaluating the difference this approach to reasoning makes .. for example Patient data summarization. We are going through the library of prompts and contexts from our past experiene to see the relevance of reasoning. Meanwhile I wanted to share some observations about the evolution of these models with publicly available data.
One way to do this put to test using commonly known puzzles. This post is about reasoning of a classic puzzle I remember from way back … I had read this in a book. Looked it up and Wikipedia of course had this puzzle. I gave this puzzle to multiple models – o1 (the latest released lat week), GPT 4o (released a few months back) You can clearly see the improvement over time. This post is just a raw presentation of the response of the different models.
Response from o1 preview
To solve this puzzle, let’s carefully analyze the conversation and use logical deduction:
- Suresh says, “Prakash does not know X and Y.”
- Implication: Suresh knows the sum ( S = X + Y ).
- For Suresh to confidently say that Prakash doesn’t know ( X ) and ( Y ), all possible products corresponding to his sum ( S ) must be ambiguousโthat is, each product can be formed by more than one pair of numbers greater than 1.
- Prakash says, “Now I know X and Y.”
- Implication: Prakash knows the product ( P = X \times Y ).
- Upon hearing Suresh’s statement, Prakash realizes that the sum ( S ) must be one where all corresponding products are ambiguous.
- Using this, and knowing the product ( P ), Prakash can narrow down the possibilities to a unique pair ( (X, Y) ).
- Suresh says, “Now I also know X and Y.”
- Implication: Suresh uses the fact that Prakash has determined the numbers to deduce the unique pair from his sum ( S ).
Logical Deduction:
- Let’s assume ( X = 4 ) and ( Y = 13 ). Then:
- Sum ( S = 4 + 13 = 17 ).
- Product ( P = 4 \times 13 = 52 ).
- For Suresh:
- Sum ( S = 17 ).
- All possible pairs adding up to 17 and their products:
- ( (4,13) \rightarrow 52 ) (ambiguous, as 52 can also be ( 2 \times 26 ), but 26 is not greater than 1 and less than or equal to 100 for both numbers)
- ( (5,12) \rightarrow 60 ) (ambiguous)
- ( (6,11) \rightarrow 66 ) (ambiguous)
- ( (7,10) \rightarrow 70 ) (ambiguous)
- ( (8,9) \rightarrow 72 ) (ambiguous)
- Since all products are ambiguous, Suresh knows Prakash cannot deduce ( X ) and ( Y ) from ( P ) alone.
- For Prakash:
- Product ( P = 52 ).
- Possible pairs for ( P = 52 ):
- ( (2,26) ), ( (4,13) )
- Since ( (2,26) ) doesn’t satisfy ( Y > X > 1 ) (as 26 is greater than 100), the only valid pair is ( (4,13) ).
- Knowing Suresh’s statement, Prakash deduces ( (4,13) ).
- For Suresh:
- Knowing Prakash has deduced the numbers, Suresh concludes that the only possible pair consistent with both their knowledge is ( (4,13) ).
Answer:
The numbers are X = 4 and Y = 13.
Response from GPT 4o
Letโs break down the puzzle step by step, keeping in mind the key facts:
- Suresh knows the sum (X + Y) and Prakash knows the product (X ร Y).
- Both of them are perfect logicians, and they communicate truthfully.
- The conversation happens as follows:
- Suresh: “Prakash does not know X and Y.”
- Prakash: “Now I know X and Y.”
- Suresh: “Now I also know X and Y.”
Step 1: Interpreting Sureshโs First Statement
When Suresh says, โPrakash does not know X and Y,โ this tells us that the sum (X + Y) that Suresh knows cannot correspond to a product (X ร Y) that has a unique factorization.
If the product were factorable into a unique pair of numbers, then Prakash would have known X and Y immediately. Since Suresh is confident that Prakash doesnโt know X and Y, it implies that for the sum Suresh has, there are multiple possible factorizations for X ร Y, meaning the product isnโt uniquely determined by the sum.
Step 2: Interpreting Prakashโs Statement
Prakash then says, โNow I know X and Y.โ This means that, after hearing Sureshโs statement, Prakash has enough information to determine X and Y. In other words, although the product had multiple factorizations initially, Sureshโs statement ruled out certain possibilities for the sum, allowing Prakash to narrow it down to a single pair (X, Y).
Step 3: Interpreting Sureshโs Second Statement
Finally, Suresh says, โNow I also know X and Y.โ This indicates that, after hearing Prakashโs statement, Suresh is able to deduce the values of X and Y as well. This can only happen if, for the sum Suresh knows, there is now exactly one pair (X, Y) that matches Prakashโs product.
Step 4: Working Through the Logic
Letโs test some values for the sum ( S = X + Y ) and product ( P = X \times Y ).
- Suppose ( X + Y = 11 ). The possible pairs for ( (X, Y) ) are:
- (2, 9) with product ( 2 \times 9 = 18 )
- (3, 8) with product ( 3 \times 8 = 24 )
- (4, 7) with product ( 4 \times 7 = 28 )
- (5, 6) with product ( 5 \times 6 = 30 )
Now, Prakash knows the product, and each product has multiple factorizations (e.g., 18 can be factored as ( 2 \times 9 ) or ( 3 \times 6 )). But when Prakash hears Sureshโs statement, he rules out certain sums, and thus he can determine the correct pair.
For example, if Prakashโs product was 30, the only possible pair with a sum of 11 is ( (5, 6) ). Once Prakash knows this, Suresh can also deduce the pair because he knows the sum is 11 and the only remaining possibility is ( (5, 6) ).
Conclusion
The values of ( X ) and ( Y ) are 5 and 6, respectively.
Response from Claude 3.5 Sonnet
Let’s approach this step-by-step:
1) First, for Suresh to say that Prakash doesn’t know X and Y, the product X ร Y must not be unique. This means there must be at least two pairs of numbers that could give the same product.
2) When Prakash says he now knows X and Y, it means that Suresh’s statement has eliminated all but one possibility for the product.
3) When Suresh says he also now knows X and Y, it means that knowing Prakash now knows, combined with knowing the sum, is enough to determine X and Y uniquely.
4) Given that Y > X and their sum is not greater than 100, let’s consider products that have multiple factor pairs within these constraints.
5) The key insight is that the product must be a number that initially has multiple factor pairs, but only one pair that sums to the number Suresh knows.
6) After checking various possibilities, we find that 48 is the only number that satisfies all conditions: 48 = 4 ร 12 (sum 16) 48 = 6 ร 8 (sum 14)
7) Initially, Prakash wouldn’t know if it’s (4,12) or (6,8). But when Suresh says Prakash doesn’t know, Prakash can deduce it must be (6,8) because if the sum was 16, Suresh would know it’s (4,12).
8) When Prakash says he now knows, Suresh can confirm it’s (6,8) because that’s the only pair summing to 14 that Prakash could have figured out from Suresh’s first statement.
Therefore, X = 6 and Y = 8.
Response from Claude 1 (ancient – 18 months back)
Okay, let’s think through this step-by-step: * X and Y are two whole numbers greater than 1, and Y > X * Their sum is not greater than 100 * Suresh knows the sum X + Y * Prakash knows the product X * Y * Initially, Prakash does not know X and Y * After Suresh says “Prakash does not know X and Y”, Prakash is able to deduce X and Y * Once Prakash says “Now I know X and Y”, Suresh is also able to deduce X and Y
Some possibilities for X + Y that are less than 100: * X = 5, Y = 10, X + Y = 15 (But then Prakash would know X * Y = 50 initially) * X = 10, Y = 20, X + Y = 30 (But then Prakash would know X * Y = 200 initially) * X = 15, Y = 25, X + Y = 40 (Prakash would know X * Y = 375 initially)
The only way Prakash would not initially know X and Y is if there are multiple pairs of numbers that have the same product. The smallest product that fits this is X * Y = 36. The pairs of numbers less than 100 that have a product of 36 are: * X = 6, Y = 6 (But this doesn’t satisfy Y > X) * X = 9, Y = 4 (Sum is 13 which is less than 100) * X = 12, Y = 3 (Sum is 15 which is less than 100)
Therefore, the only valid options are X = 9, Y = 4 or X = 12, Y = 3. Based on the additional constraint that Y > X, the final answer is: X = 9 Y = 12
So in summary, X = 9 and Y = 12. Let me know if this makes sense!
Response from GPT 3.5 turbo-0125
X and Y are 5 and 8.
When Suresh says “Prakash does not know X and Y”, this means that the product X ร Y cannot be factorized into two possible pairs of integers greater than 1, because if that were the case, Prakash could not determine the numbers.
Possible factorization pairs are: 1. 3 ร 3 = 9 2. 2 ร 6 = 12 3. 4 ร 4 = 16
However, when Prakash says “Now I know X and Y”, this means that the product X ร Y is not one of the factorization pairs listed above and hence, X and Y can be 5 and 8, as their product is 5 ร 8 = 40.
Therefore, the final statement by Suresh, “Now I also know X and Y” confirms that the sum of X + Y equals 13 (5 + 8).