How AI is changing our relationship with documents

By Andy Phillips,
June 12, 2025

I was seven, maybe eight, sitting cross-legged on the carpet of my father’s home office. The computer—a beige tower that seemed massive at the time—was making these grinding, whirring sounds. 

“What’s happening?” I asked.

“The computer’s thinking,” my dad explained.

Thinking? How could a box think? I tried to wrap my kid brain around that.  I could hear the disks spinning, like hamster wheels inside the machine.

This was the late 80s. We had Prodigy dial-up, which made us practically pioneers. I’d watch my father load Windows 3.1, compress hard drives to squeeze 32 megabytes of actual space into 52 megabytes of usable storage, or wait for the dot matrix printer to zip back and forth. Magic, all of it.

But that word—thinking—stuck with me.

Eventually, my father elaborated. He said something I still reference today when my mother-in-law calls for IT help with her iPhone: “Computers only do what you tell them to do.”

He was right then. He’s still right now.

Even as generative AI is disrupting every industry from medicine to music, computers still only do what we tell them. 

The breakthrough isn’t that they’ve learned to think. The breakthrough is that we’ve learned to tell them to do something extremely sophisticated: answer our questions.

Francis Bacon figured out 400 years ago that “A prudent question is one-half of wisdom.” My dad’s wisdom and Bacon’s insight are the same truth, separated by centuries and a few advancements in technology. The quality of what you get out depends entirely on what you put in.

Here in the world of alternative investment, Canoe is learning to ask better and better questions of documents.

Interestingly, a very similar transformation has already taken place in the realm of online search. And, at Canoe, we’re already witnessing the drastic implications for alternative data operations.

 

Yahoo’s billion-dollar mistake

You’re probably old like me if you remember a time when Yahoo was worth more than Google.

In the mid-90s, Yahoo had it all figured out. They were employing massive teams to categorize every website on the internet manually. Travel sites here, news there, shopping in another still. 

An army of humans was creating the perfect taxonomy for the World Wide Web.

It was working great right up until it wasn’t.

The internet exploded. Suddenly, there weren’t thousands of websites but millions. Then billions. Yahoo’s approach—having humans visit each site and file it in the right folder—was like emptying the ocean with a teaspoon.

Meanwhile, two PhD students at Stanford had a different idea. What if, instead of forcing websites into predetermined boxes, you could just… ask it about what you’re looking for?

Those PhD students who became Google’s founders didn’t care about perfect categories. Its PageRank algorithm understood something revolutionary: context matters more than classification. A travel blog might link to a restaurant, which might link to a mapping service. The connections told the story.

For years, the entire alts data industry—Canoe included—has been pioneering a particular flavor of machine learning called “pattern recognition.” This is the art of training machines to extract data where we expect to find it. 

It works brilliantly for 95% of what matters.

But pattern-matching actually mirrors Yahoo’s philosophy: it assumes you know what you’re looking for before you start.

When you only search for what you expect, you’ll miss the unexpected every time—like that new field the GP added, or the clarification in footnote 47, or the connection between two locations that only makes sense in context.

Google’s revolution didn’t displace categorization entirely—and neither are we. They simply admitted they couldn’t predict every possible way someone might need to find information.

The same shift has happened with alts data extraction, we’re moving from rigid to fluid. From “extract these 50 fields” to “help me understand this document.”

The Google Philosophy, if you will, is to let the document tell you what matters instead of demanding it fit your predetermined schema.

 

From interrogation to conversation

The first time it occurred to me that advancements in AI meant we could now “ask questions of documents,” the idea felt absurd.

But consider how we, as humans, learn: through inquiry. Watch any child navigate the world. They ask “why?” relentlessly, and each question helps them build deeper understanding. It’s the most natural way for humans to acquire knowledge.

Traditional (OCR-style) extraction is more akin to interrogation. You show up with your checklist, demand specific answers, and, if the document doesn’t comply with your predetermined format, too bad. On to the next one.

But what if we treated documents more like… patients?

Stick with me here. When a document lands on our desk, we need to diagnose it. What type is it? What’s it trying to tell us? Are the numbers correct? Is something missing? Has anything changed since the last time this fund sent us something?

To get that understanding, we need to ask real questions. Not questions like, “give me the value at this coordinate,” but ones like, “what’s the total distribution amount, wherever you might have put it?”

The mass AI adoption we’re seeing today is largely due to the consumer-friendly “chat” interfaces powered by large language models (LLMs). These AI models study how humans express ideas and can predict meaningful responses. This isn’t magic—it’s just pattern recognition at a massive scale. But the true breakthrough is that the accumulation of patterns means LLMs also understand context, not just keywords.

Say a GP decides to add a new fee disclosure. Rigid extraction will miss it because it’ll look for “Management Fee” in the usual spot. An LLM reading the document sees “inaugural technology infrastructure charge” in paragraph six and thinks, “Hmm, that sounds like a fee—I better flag this.”

Canoe has been a pioneer in applying machine learning to alternative investment documents for years. Patterns might catch 95% of what matters. But that last 5%? When documents get weird—and in alternatives, documents always get weird—you want something that can adapt. This is where LLMs—if done right—can deliver.

The future of alternative data management includes conversing, where we were previously interrogating. Same game, better glove.

 

What curiosity at scale will mean for alts

Forty years after I observed my father’s computer “thinking,” we’ve managed to train machines to display something like curiosity.

Curiosity, at scale, will be a paradigm shift for alternatives processing.

Ultimately, only the most scale-friendly systems will be able to keep up with the growing trend of reallocation toward alternatives. Scale becomes table stakes for an asset class where every fund structures information differently, critical details hide in footnotes, and formats shift quarterly. 

Right now, alts automation as an industry is still in the first inning in terms of what will be possible with AI. Investment offices are racing to understand what LLMs could mean for their operations. Plenty of people are still uploading a PDF to ChatGPT for the first time, and watching in amazement. 

Canoe’s focus has been light years beyond out-of-the-box summarization features. We’re thinking about AI at an industrial scale and building the precision within specialized models that the financial sector deserves.

We believe in creating models that know what’s important to you, not just what you asked about. Our systems notice when something’s different, unusual, or worth a second look. We build machines that can say, “I’ve seen a thousand distribution notices, and this one’s weird. Want to know why?”

This is a compounding effect that very few have woken up to as of yet. Every question can now lead to better understanding. Every document can teach the system something new. Every edge case can make the next one easier.

My dad was right. Computers only do what you tell them to do. Turns out, the most powerful instruction we can give them is: ‘Go explore.’

 

###

About Canoe Intelligence
Canoe Intelligence redefines alternative investment intelligence through AI-driven, cloud-based software that automates and streamlines alternative investment workflows. With over $8T in assets under automation and more than 400 clients, Canoe empowers institutions, LPs, and wealth managers to transform their alternatives data operations. Learn more: www.canoeintelligence.com

 

MEDIA CONTACT:
Betsy Miller Daitch
Canoe Intelligence
+1 443-690-6200
bdaitch@canoeintelligence.com

Canoe for Wealth Managers Brochure

"*" indicates required fields

This field is hidden when viewing the form
This field is hidden when viewing the form
This field is hidden when viewing the form
This field is hidden when viewing the form
This field is hidden when viewing the form
This field is hidden when viewing the form
This field is for validation purposes and should be left unchanged.