Skip to content
Advantage Performance Group • We help organizations develop great people.
artificial intelligence in learning & development

Corpus credibility: The key ingredient for artificial intelligence in learning & development

Less might be more when leveraging AI for learning support

Understanding what’s feeding the answers is critical in AI. Julie's own "ChatJWG" tool, the GrowBot, uses 10 years of articles, blog posts, infographics, marketing material, and course outlines.

Hallucinations. Context-free responses. Unknown sources. Disturbing dialogue.

Sounds like a scene from a bad movie about the ‘60’s… or a recent engagement with AI. Examples of strange exchanges and flat-out misinformation generated by ChatGPT and its various artificial intelligence cousins abound. At the same time, it’s becoming increasingly clear that this technology has implications for nearly every corner of the workplace, including learning and development.

As an L&D professional myself, I am excited about the possibilities associated with offering learners a massive universe of information that can be queried in the workflow for "just-in-time/just for me" insights. This technology enables so much of what we’ve all been working to deliver for years – pull-based knowledge, embedded instruction, micro, personalized, and self-directed learning. The development and performance implications are mouth-watering.

And yet, there’s reason for concern about the quality of the answers that AI tools will deliver to a learner’s questions. And it boils down to a new term to many of us: corpus. Michael Grothaus defines corpus in a recent Fast Company article as “the material the AI reviews to become intelligent in whatever it was designed for.”

Here's the rub: the quality of the answers is dependent upon the quality of the material that’s been scraped. It’s sort of a "garbage in, garbage out" situation. Or perhaps more accurately "unknown in, untrusted out" in this case. How can we confidently have learners asking questions of a source whose own sources are opaque? It’s becoming clear that corpus credibility matters.

That’s why public large language models like ChatGPT or Azure will likely be less acceptable – and effective – learning tools than those that are more contained, curated, and transparent. Imagine learners being able to get targeted answers to their most pressing questions from a fully vetted and philosophically aligned thought leader or author. No need to imagine because this capability exists and is being implemented by more content providers every day. (Note: One example is the GrowBot that I’ve incorporated into my own website at juliewinklegiulioni.com.)

As these custom, smaller language models become more available, L&D professionals will be able to recommend to learners a variety of websites belonging to experts who offer content that’s known and understood to be valuable. Because these custom sites will draw exclusively upon that expert’s content, we’ll be able to trust the answers that are delivered.

As this new marketplace emerges, here are questions we must ask:

  • What specifically has been scraped as source material for responses? Understanding what’s feeding the answers offers the credibility required to confidently recommend it to learners. In my case, 10 years of articles, blog posts, infographics, marketing material, and course outlines are the GrowBot’s corpus.
  • How frequently is data scraped? One of the drawbacks of ChatGPT is that it’s based upon what was pulled from sources available in 2021 and earlier. Today’s learners need and demand up-to-the-moment information, so the currency of the corpus is critical.
  • How are citations handled? This is particularly important when using AI for learning. Public tools do not cite sources. This creates a credibility problem – but it also means that curious learners have no guidance for how to dig more deeply into the topic. Easy-to-follow references turn a static response into a dynamic learning journey.

Ensuring the highest levels of corpus quality is key to harnessing the full potential of AI technology and offering a vast yet curated and reliable universe of information for learning. Which just might make corpus credibility the next frontier – and focus – for forward-thinking L&D professionals.

Julie Winkle Giulioni
Scroll To Top