Better Recommendation

Validating AI-generated Subject Terms Through LOC Linked Data Service

Kwok-leong Tang

Harvard University

2025-03-11

In the Last Episode…

  • At the 2024 CEAL Annual Meeting, LCSH recommendations were highlighted as a highly desired GenAI tool.
  • The tool aims to accelerate the speed of cataloging in libraries, but it will not replace human catalogers. In fact, human catalogers are more important than ever.
  • What we need is a better recommendation system.

Constrains

  • Controlled Vocabularies
    • Library of Congress Subject Headings (LCSH) is a controlled vocabulary.
    • Controlled vocabularies are “living” (being updated).
  • Large Language Models
    • Large language models have knowledge cutoff dates.
    • The responses from large language models are next-token-prediction. It may have slightly differences every single time.

Constrains

  • Keeping Up with SOTA Models and Features
    • New state-of-the-art (SOTA) models are released almost every month.
    • New engineering features are introduced frequently.
  • Cost of a Standalone GenAI Tool
    • Every query incurs a cost (API/token cost).
    • Providing functions like OCR are more expensive.

Possible Solutions to Increase Accuracy

  • Prompt Engineering
    • not accurate enough
  • Finetuning
    • knowledge cutoff dates
  • Retrieval Augmented Generation (RAG)
  • Function Calling

LC Linked Data Service

Created an API for the LC Linked Data

  • The Library of Congress does not provide a direct API for LCSH (?)
  • I developed an API that accepts a list of terms.
  • This service utilizes the search functionality on the LC Linked Data Service website.
  • It returns a maximum of 20 results, compares them against the input terms, and provides a similarity score and URL of each suggested LCSH term.
  • Any LLM with function calling capabilities can use this API to validatae their recommendations.

Adopt the API in a GenAI Tool

  • I used ChatGPT to create a GPT as a proof of concept.
  • Why use a GPT on ChatGPT instead of building a standalone tool?
    • No API costs. No hosting costs for frontend or backend.
    • No costs for processing files.
    • Even unpaid users can use it with a limited number of requests.
  • The same approach can be applied to other GenAI tools, such as Copilot and Gemini.

Demostration of the GPT

Eric Chow’s GPT on OCLC Subject Headings

Link to the GPT

Questions and Difficulties

  • Can we upload images?
  • Cannot access public GPTs through ChatGPT EDU
  • Cannot access ChatGPT for various reasons (workspace rules, geographical blockage, etc.)
  • Budget cut, no extra funding for subscription

Solutions

  • Can upload images, audio, or even videoes.
  • If you have ChatGPT EDU, you can create your own GPT. I will share the complete instructions and prompts.
  • For geographical blockage, you can look for other LLM platforms with function calling.
  • For the budget cut……

The LCSH Recommendation Tool

The LCSH Recommendation Tool

  • It is a browser extension that works with Google Chrome, Edge, Brave, and other Chromium-based browsers.
  • It is free to use.
  • It utilizes the Google Gemini 2.0 Flash model. Free tiers:
    • 15 requests per minute
    • 1500 requests per day

Demostration of the LCSH Recommendation Tool

Under Review

  • Once it is approved, it will be available on the Chrome Web Store.
  • If you want to try it now, register here.

What next?

Maintainers of Controlled Vocabularies

  • Provide validation APIs for better integration with AI platforms
  • Provide Model Context Protocol (MCP) servers and services

Community

  • Try the tools.
  • Engage discussions.

Acknowledgements

  • Jessalyn Zoom, Lia Contursi, and the CTP team
  • Yi Jiang, Tang Li, and Haruko Nakamura
  • Fabiano Takshi Rocha, Hana Kim, and the East Asian Library team at UofT
  • Sachie Shishido, Charlotte Cotter

Thank you very much!