The Qwen AI Model: Constraints, Nuance, and Toppling Statues
Top Tech Stories. Overfitting. Censorship. 3D Printing. Pandas. Scrooged. Jimmy Carter.
I’m not sure this newsletter has an overarching theme outside of writing about what I find interesting. I try to thread the needle between writing (and everything that comes with long-form book creation), a certain baseball team in a rough spot (Cardinal fans, it was a tough season), and inspiration mixed with analytics (stories exist in data). Technology serves as an underlying thread since the industry moves at a brisk pace—quite an understatement.
This year, with articles of the year all over the place, I tried to pinpoint a single remarkable story. But the one ring to rule them all doesn’t exist. I suppose there are numerous honorable mentions: the tech gods bowing at Mar-a-Lago, drone impact in Ukraine, Syria, and New Jersey, rapid job displacement despite impressive earnings, the rise of Bitcoin, laptop gains (the new ARM chips are real), quantum, and the continued rise of open source (despite headaches).
Yet, the tagline of the JPLA newsletter is that our words do matter—no matter how they are created. So, for me, it’s the year of large language models (LLMs). Note, I didn’t say AI or GENAI. I’m describing the technology at face value—a statistical model that predicts answers based on certain inputs. That’s what it is. It’s not your friend or family or professor. One can pretend. But it’s just math that sometimes adds wrong and makes mistakes.
And I suspect this is not surprising to readers, at least those who have been with me for years. Looking at previous threads, I did build an application detecting challenges in fictional text. Also, I tried to create a Python writing alternative of myself. Both projects use various LLMs. They show promise but don’t quite reach the finish line.
Instead of simply saying LLMs are the story, I want to get a bit more granular because it’s been a ride. We’ve seen video models. And image models. And code-writing models. And the list goes on… So I found it fascinating that Sam Altman, founder of OpenAI, made the case that the path to AGI is “basically here,” especially considering we’re not done modeling yet. Or finding the next killer use case to change our lives.
Overfitting and Lessons from StoryMaster
So why is he saying this? Good old-fashioned PR? Yes and no, the current LLM paradigm might possess a plateauing effect. Two years ago, I tinkered with what I dubbed StoryMaster—my own model using my corpsus. Unfortunately, my version didn’t achieve peak performance because I lacked data. Nor did I think about grifting the internets’ (or transcribing YouTube videos) and running it through a learning process. Yet, I tried, iterating on dozens of versions with Pytorch. Sometimes, I’d massage the data. Or use headers. Or change the order.
And my approach—screwing around—required weeks of training based on hardware limitations.
If you’re curious, one doesn’t need hosts of NVIDIA GPUs lying around. I ran my training on aging machinery, sometimes, dare I say it, using CPUs Vs. GPUs. Yes, time wasn’t a friend—it took forever. But the feat can be done.
Here, there is a concept of overfitting. It’s fairly simple, the longer the machine trains there comes a point where it regresses. The same happens with people too. Sometimes, you just need a walk to clear the head. But if you overtrain, the model can perform worse. Or, with people, you give up or tear the ligaments in your foot. If you follow OpenAI, Anthropic, or any competing service on Reddit, there are hundreds of comments that the AI is getting worse. It’s true due to any number of factors—including efficiency, time of day being used, or whether the model’s performance declined.
That doesn’t mean Sam Altman and others aren’t trying new approaches. They are with agents, thinking, and synthetic data.
While this is happening, the open-source world is catching up. Meta’s models are stellar. Microsoft dropped a game changer earlier in the year. Amazon recently launched Nova, which is altering the cost paradigm. A new story each week.
Qwen’s Emergence
My article of the year is none of these achievements. A Co-Founder at Hugging Face, the company that hosts many open-source models, recently highlighted that the most downloaded model didn’t come from Meta or Microsoft or Mistral or Stability. The model is Qwen. It was developed by Alibaba. And maybe, it wasn’t supposed to be this good.
Why?
For one, it’s a Chinese company.
These aren’t businesses—they’re instruments, with certain feature sets. Western folks often think of these entities as businesses because they shimmer with a polish of innovation. But our thinking, our culture, creates a certain blind spot because upon closer inspection the actions are highly influenced by the Communist Party. US automakers are starting to realize this. In China, to build a factory, a partnership has to exist with a local entity. Or, cough, the government. It’s a technology trade.
Today, US automakers are struggling, the country has their own companies, and the market’s promise isn’t what it used to be. What happens now?
And two, the US government banned Chinese access to high-quality NVIDIA chips.
When lacking industry-leading GPUs, Alibaba found a way. They worked around the restriction, devising means around export controls and leveraging chips within country. Coupled with optimization techniques like quantization (shrinking the numbers so the computer doesn’t break a sweat), they crafted an efficient model.
Like my inferior series of computers and cheap cloud storage, training Qwen might’ve taken longer, but they made it work without throwing billions upon billions into the latest GPUs.
Censorship Dilemma
Benchmarking shows it’s an impressive model. Qwen demonstrates competitive performance relative to proprietary models when it comes to language understanding, multilingual proficiency, coding, mathematics, and reasoning. It’s a Kentucky thoroughbred. And in my own experiments for this article, it runs well.
Yet, there are strings.
The model is like a student who excels at everything the curriculum asks of them—until you ask a question that veers off the syllabus. It’s fluent, confident, and articulate, but when the conversation touches on subjects like Tiananmen Square or Hong Kong protests, it falters. Not because it lacks intelligence, but because it has learned that some doors are not worth taking.
China’s regulatory environment is pushing an agenda, forcing enterprises and even multinational companies to embrace Qwen. Want to use AI in your product or service? Then, use a sanctioned model. Or brace for regulatory headaches.
When Data Changes and Statues Topple
But scrubbing the historical record isn’t a new phenomenon in human history. When Iraq fell, statues of Saddam toppled. The same happened in Syria, slow but then all at once.
And let’s not think Western countries don’t have similar challenges. For now, US models respond with the following when asked:
On January 6, 2021, a violent attack occurred at the United States Capitol in Washington, D.C., as a joint session of Congress convened to certify the results of the 2020 presidential election, which declared Joe Biden the winner.
And I believe that’s the answer. Like many, I watched this cluster of tragic proportions unfold on television. Changing the record doesn’t do anyone favors—it only creates terrible cycles and patterns. We should know the facts, the record. We should know the story of tankman. We should even know that NASA landed on the moon.
And the list goes on. Note, that doesn’t means we shouldn’t argue or debate current events.
As for China, the government’s regulations will eventually make using foreign models challenging. With Qwen being open-sourced, the push is global. It’s not just a Chinese play; it’s a subtle nudge to Western companies, saying, “Hey, use me.”
For now, much of the rapid adoption is country-specific. But it may not be forever. We’ve seen this play out in sports entertainment, the tech industry, and manufacturing. I mean, who wants to build applications using multiple models? A support cost does exist.
Ultimately, remember our choices matter—and the technologies we adopt shape the stories we tell.
Runner-Up Thoughts:
Pandas! I beat the NY Times to the punch, but they got there. Those darned bears sure are cute.
I thought social media was going to turn into the next evolution of open-source services using social protocols. Meta lags. X has become less open. Blue Sky adoption came out of nowhere (didn’t see their growth coming). Yes, this space will continue to evolve, but technical fragmentation is apparently the next wave.
The Google Anti-Trust case. Wait and see.
3D Printing, Ghost guns and quick rebuilds.
Tracking tipping using Uber data.
Linux on the rise.
Personal productivity systems are having a moment. See Notion.
Other Notes:
On censorship of Qwen, here is a solid write-up/review on limitations. I’d post by own findings but this is far better than anything I could do.
The headline picture of Shanghai was taken in 2002—altered using Adobe AI. Note, this looks nothing like the true city skyline. Artist, er, computer creativity on the move.
Be Cool, Pass The JPLA On …
Cardinals Closing Out an Era (Chaim Bloom Era):
Goldy signed a one-year deal with the Yankees. High-level, I’m expecting a rebound for him in a hitter friendly park. What’s more fascinating is the direction of the St. Louis faithful. Obviously, they are rebuilding, Arendo staying or going withstanding. It’s hard to trade someone with a no-trade clause who only wants to be with one team.
Still, my hope is that this will be a quick rebuild, statistically the Cardinals have a midland farm system (If you’re curious, the White Sox have the best system in baseball). However, what makes this a midland system is not their hitters—they are competitive here. It’s the pitching profile.
Drafting low-velocity collegiate pitchers is a gamble because throwing heat rarely improves after college, leaving little room for development gains in a league driven by power. The profile lacks strikeout potential needed to dominate modern lineups, forcing reliance on control and defense. Mathematically, this makes for a tough road against today’s high-octane offenses.
That being said, Quinn Mathews might be the exception to the rule. Since leaving Stanford, his velocity has increased about five MPH. And why not like a guy who threw a complete game with 150 plus pitches? It’s so Jack Morris.
What I’m Watching, Scrooged (Murder Hornet Edition):
We often think chaos is just around the corner—it’s human nature. I rewatched Scrooged over the holidays—the television commercial inside the movie, hits the feelings of the times. Nuclear War. Mass shootings. But the teaser of sorts also mentions acid rain. And for the life of me, I couldn’t remember what happened to it. Well, apparently, through science and prevention it’s not a thing anymore, except in certain parts of the world.
Well, we eradicated Murder Hornets in 2024. There is always good news; sometimes, you just have to look for it.
What I’m Reading (Ten Years Between Books) :
Donna Tartt is known for creating immersive worlds where every detail matters. Her writing is deliberate and precise, drawing readers into lives and minds with a certain moral ambiguity. There is a reason why ten years passes between her projects.
The Secret History is thirty years old (over 800 thousand GoodReads ratings). The work had a resurgence in 2024. I’m only about half-way through—not seeing a happy ending here, a modern-day Gatsby, even if unintentional, but much longer.
When Words Don’t Fail:
“I’ve had a wonderful life, I’ve had thousands of friends, and I’ve had an exciting and adventurous and gratifying existence.” President Jimmy Carter
“I know with this music, it heals, it transforms the feelings that we’re going through into something better. Gotta keep on going.” Jonas Green, New Orleans resident and Trombone player.