Links Justin Sends 1/31

On a Microsoft AI Panel, Were you surprised by DeepSeek? We weren't...

I was fortunate enough to be a panelist on Microsoft’s AI event Wednesday night, kicking off their AI Tour in NYC. Here’s a picture 👇️ 

If you’ve been paying close attention to our work here at BetterFutureLabs, you wouldn’t have been too surprised by the DeepSeek R1 release.

We’ve been talking about the rise of inference-based cognition for the past eight/nine months, and it seems the general population have finally taken notice. Specifically that there is no longer a direct correlation between compute power required to train & run inference on a model and it’s capability.

Side Note: It’s been funny to watch some really misguided takes that have surfaced, and it’s helped me identify several AI influencers who clearly lack a deep understanding of the underlying technology. If you’re curious about which influencers to unfollow, feel free to reach out and I’ll share the list.

Let’s take a closer look at what I mean by inference-based cognition and examine some of the key advancements in this space—which DeepSeek has helped bring to mainstream attention, and in some cases, contributed to directly.

  1. Mixture of Experts (MoE): A technique where different parts of a model specialize in solving specific problems, making the model more efficient by only using the most relevant parts for each task. (This is how DeepSeek V3 has only 37B active parameters even though the model has 671B total parameters)

  2. Multi-head Latent Attention (MLA): A method that helps AI recognize complex patterns by looking at multiple pieces of information at the same time, improving its understanding and decision-making.

  3. Group Relative Policy Optimization (GRPO): An advanced reinforcement learning framework that improves training efficiency and stability.

  4. Chain-of-Thought (CoT) Reasoning: Enables the model to generate intermediate reasoning steps, enhancing its performance in complex problem-solving tasks.

  5. Reinforcement Learning with Cold Start: Utilizes reinforcement learning directly on a base model without extensive pre-training, reducing reliance on large supervised datasets.

  6. Distillation: Transfers knowledge from the larger DeepSeek-R1 model to smaller models, improving efficiency while maintaining high reasoning capabilities.

Now on to the links 👇️ 

🤏 Mistral Small 3: Mistral launched an ope source 24 billion parameter model that is competitive with 70B and 32B parameter models and does a great job on tasks similar to 4o-mini. Excited to start running this one on my laptop!

🦆 🦆 Goose: Engineering AI Agent supported by Jack Dorsey, showing promising results and the best part — its open source!

💰️ $72 million invested in Princeton AI Hub: Big players like Microsoft and CoreWeave invested significant dollars in growing NJ’s AI presence — hopefully this brings more deep AI research to the state — we need more top tier AI R&D in NJ, it’s getting a bit lonely for us at BetterFutureLabs!

Have a great weekend,

-Justin

aka the guy with great AI links

Co-founder & Head of Technology @ BetterFutureLabs