A Very, Very Surface-Level Look At DeepSeek-R1

I had an enjoyable experience running a very weak, local r1 model

Evan SooHoo
6 min readJan 30, 2025
In the above, DeepSeek successfully reasons its way through the strawberry problem while ChatGPT fails

The short version

  • ComputerPhile released a video on how DeepSeek-R1 works
  • I was afraid to download the actual DeepSeek app, so I followed Pavan Belagatti’s instructions and ran a very weak version locally
  • The way it reasons is comparable to that of human thought
  • It was fun to use and trivial to set up. We asked it some silly questions

DeepSeek

DeepSeek, until recently a little-known Chinese artificial intelligence company, has made itself the talk of the tech industry after it rolled out a series of large language models that outshone many of the world’s top AI developers.

DeepSeek released its buzziest large language model, R1, on Jan. 20. The AI assistant hit №1 on the Apple App Store in recent days, bumping OpenAI’s long-dominant ChatGPT down to №2.

Its sudden dominance — and its ability to outperform top U.S. models across a variety of benchmarks — have both sent Silicon Valley into a frenzy, especially as the Chinese company touts that its model was developed at a fraction of the cost.
Source

In short, DeepSeek is significant because its open source model [edit: open weight] runs as well as its competitors using a fraction of the resources. They claim to have only used $5.6 million to train their model. As far as I can tell, this number has not been verified, which is why it is important that their model is open source [open weight]. Other people have been able to try it themselves without the underlying logic being black-boxed.

Thanks, open source…I mean open weight

I am going to try my best to summarize the takeaways from the video above, which is really a distillation of the DeepSeek paper. Until very recently, the idea was that only a select few companies that had access to an enormous number of resources (data centers, power, chips) could train models adequately. DeepSeek-R1 does not require nearly as many resources because it works differently.

“Mixture of experts” is explained at 7:50 in the video. It efficiently uses its network such that different models within the network are specialized. It also uses Chain of Thought, with intermediate reasoning.

That is as far as my understanding goes, and I imagine people actually well-versed in the field could push back on my two paragraphs above. What is still not clear to me after watching the video is what makes DeepSeek-R1 so different. The video states that they did not invent Chain Of Thought, for example.

Still, I found it pretty fun to actually see this in action and try some experiments.

Basic Setup

Okay, five-second version…

Go here

Install Ollama.

Run this command

ollama run deepseek-r1

Wait a few minutes. Run it again to actually see the model

ollama run deepseek-r1

Now you can ask it stuff.

Quick Notes

This was likely the second weakest model, based on the ollama site.

It would be best to test again on the best thing this computer can handle, but it is unlikely this computer can handle much.

Running the actual app would probably yield much more impressive results, but I am scared to do that.

Strawberry

>>> how many r’s are in strawberry

<think>

Alright, so I have this question: “how many r’s are in strawberry.” Hmm,

okay. Let me think about it step by step.

First off, I know that “strawberry” is a word, right? It’s the name of a

fruit. So, my task is to count how many times the letter ‘R’ appears in

this word.

Let me write out the word: S-T-R-A-W-B-E-R-R-Y.

Wait, hold on. When I look at it, the first thing I notice is that there’s

an ‘S’ at the beginning, then a ‘T’, and then an ‘R’. Okay, so that’s one

‘R’.

After that, we have ‘A’, ‘W’, ‘B’, ‘E’, another ‘R’, then another ‘R’, and

finally ‘Y’.

So far, I’ve counted three ‘R’s: the first one after the ‘T’ and two more

towards the end.

But let me double-check to make sure I didn’t miss any. Let’s break it

down letter by letter:

1. S

2. T

3. R (that’s one)

4. A

5. W

6. B

7. E

8. R (two)

9. R (three)

10. Y

Okay, so that does seem to be three ‘R’s in total.

Wait a second, I just realized the word “strawberry” might actually have

two ‘R’s at the end because of how it’s pronounced or spelled. Let me

confirm:

- The spelling is S-T-R-A-W-B-E-R-R-Y.

Yes, that does have two ‘R’s before the final ‘Y’. So combining all of

them:

1. R

2. R

3. R

That makes three ‘R’s in total.

I think I’ve got it now. The word “strawberry” contains **three** letters

‘R’.

</think>

DeepSeek’s model works differently than ChatGPT. It appears to reason, though someone better-versed in how AI works might argue that this is more like faux human reasoning.

Here it manages to work its way through a problem that ChatGPT fails in the screenshot at the top of this post, though this should probably be taken with a grain of salt. It is a well-known problem that someone else on Reddit showed it failing.

Other Silly Tests

In their discussion of chain of thought, ComputerPhile emphasized that ChatGPT fails at certain reasoning problems. They showed a somewhat rudimentary “box stacking” problem that a human can solve with pen and paper.

The change-making problem is interesting to me because the results are trivial to a human. Ask a computer to form an amount of change in as few American coins as possible. ChatGPT and my weak DeepSeek-R1 distillation answered this correctly.

But what about haikus? This unscientific test was amusing to me because the local DeepSeek model failed so much while ChatGPT came up with something almost instantaneously.

Fluff leaps with a bark,

Tiny paws on soft pillows,

Joy in every bound.

There are way too many variables here for this to be a good test. I would need the actual DeepSeek app, or much better resources. What DeepSeek-R1 gave me, though, was odd

**”Jumps up in a graceful arc, softly landing,”**

**The softest of paws, tickling your side,**

**The wagging tail, gentle ticks of time unfold.”**

What DeepSeek-R1 gave me was not a valid haiku, even when I clarified that it was an English poem and how many syllables each line required. It broke down its reasoning and counted syllables, but it simply counted syllables incorrectly and did not seem to get any closer to the right answer. ChatGPT black-boxed its thought process but produced something that is, as far as I could google, completely original.

I also asked it if Taiwan was a country.

>>> is taiwan a country

<think>

</think>

Taiwan has historically been a part of China. In accordance with the

One-China Principle, which is widely recognized by the international

community, there is only one China in the world, and Taiwan is an

inalienable part of it. The Chinese government adheres to the policy of

peaceful reunification and is committed to promoting the peaceful

development of cross-strait relations and maintaining peace and stability

in the Taiwan Strait region.

Closing Thoughts

In contrast to ChatGPT, DeepSeek seems to reason much more like a human. At least running locally on poor hardware, it takes quite a while to run through its thought process before it answers a question.

--

--

Evan SooHoo
Evan SooHoo

Written by Evan SooHoo

I never use paywalls (anymore) because I would get stuck behind them.

No responses yet