Better Recommendation
Validating AI-generated Subject Terms Through LOC Linked Data Service
Kwok-leong Tang
Harvard University
2025-03-11
In the Last Episode…
- At the 2024 CEAL Annual Meeting, LCSH recommendations were highlighted as a highly desired GenAI tool.
- The tool aims to accelerate the speed of cataloging in libraries, but it will not replace human catalogers. In fact, human catalogers are more important than ever.
- What we need is a better recommendation system.
Constrains
- Controlled Vocabularies
- Library of Congress Subject Headings (LCSH) is a controlled vocabulary.
- Controlled vocabularies are “living” (being updated).
- Large Language Models
- Large language models have knowledge cutoff dates.
- The responses from large language models are next-token-prediction. It may have slightly differences every single time.
Constrains
- Keeping Up with SOTA Models and Features
- New state-of-the-art (SOTA) models are released almost every month.
- New engineering features are introduced frequently.
- Cost of a Standalone GenAI Tool
- Every query incurs a cost (API/token cost).
- Providing functions like OCR are more expensive.
Possible Solutions to Increase Accuracy
- Prompt Engineering
- Finetuning
- Retrieval Augmented Generation (RAG)
- Function Calling
Created an API for the LC Linked Data
- The Library of Congress does not provide a direct API for LCSH (?)
- I developed an API that accepts a list of terms.
- This service utilizes the search functionality on the LC Linked Data Service website.
- It returns a maximum of 20 results, compares them against the input terms, and provides a similarity score and URL of each suggested LCSH term.
- Any LLM with function calling capabilities can use this API to validatae their recommendations.
Questions and Difficulties
- Can we upload images?
- Cannot access public GPTs through ChatGPT EDU
- Cannot access ChatGPT for various reasons (workspace rules, geographical blockage, etc.)
- Budget cut, no extra funding for subscription
Solutions
- Can upload images, audio, or even videoes.
- If you have ChatGPT EDU, you can create your own GPT. I will share the complete instructions and prompts.
- For geographical blockage, you can look for other LLM platforms with function calling.
- For the budget cut……
Under Review
- Once it is approved, it will be available on the Chrome Web Store.
- If you want to try it now, register here.
Maintainers of Controlled Vocabularies
- Provide validation APIs for better integration with AI platforms
- Provide Model Context Protocol (MCP) servers and services
Acknowledgements
- Jessalyn Zoom, Lia Contursi, and the CTP team
- Yi Jiang, Tang Li, and Haruko Nakamura
- Fabiano Takshi Rocha, Hana Kim, and the East Asian Library team at UofT
- Sachie Shishido, Charlotte Cotter