7-002:Embarching the Future

Integrating Generative AI into Asian Studies Research

Kwok-leong Tang
Calvin Yeh
Wanchun Chiu

2025-03-14

Overview

  • Kwok-leong Tang (Harvard University)
    • Concepts
    • NotebookLM
  • Wanchun Chiu (Academia Sinica)
    • Workflow
    • n8n
  • Calvin Yeh
    • Advanced name entity recognition (NER)
    • Chrome extension

Doing East Asian Studies in the Age of AI.

AI supports research, not conclusions.

Probabilistic Predictions

Image source: Chip Huyen, AI Engineering (2024)

Question: Who did not use any kind of chatbots?

Glass Box vs Black Box (Observability)

  • Glass Box (usually writing code): You can see how the code works and understand its logic.
    • Statistic tasks with Python runtimes
    • Regular expression for matching and extraction
  • Black Box (tranforming without steps): The transformation happens in a way that you can’t easily inspect or verify step-by-step.
    • Translation, Summarization
    • Name Entity Recognition (NER)

Retreval Augmented Generation

flowchart TB
    subgraph Direct["Direct LLM Query"]
        direction LR
        U1[User Query] --> L1[LLM] --> R1[Response]
        style L1 fill:#f9f,stroke:#333
    end

flowchart TB

    subgraph RAG["RAG-Enhanced Query"]
        direction LR
        U2[User Query] --> RC[Retrieval Component]
        RC --> |Search| DB[(Knowledge Base)]
        DB --> |Relevant Documents| RC
        RC --> CP[Context Processor]
        CP --> |Enhanced Prompt| L2[LLM]
        L2 --> R2[Response]
        style L2 fill:#f9f,stroke:#333
        style DB fill:#bef,stroke:#333
        style RC fill:#fbf,stroke:#333
        style CP fill:#fbf,stroke:#333
    end

    classDef default fill:#fff,stroke:#333,stroke-width:2px;

Digital China Worldwide

dcw.fairbank.fas.harvard.edu

NotebookLM

notebooklm.google.com

Making Your Own Podcasts: Audio Overview

Text classification

Task Explanation

Does the text contain the component?

  • Five components:
    • body and mind”, “emotions”, “life and death”,“fate”, “omen
  • Pre-defined categories
  • We will upload a markdown file to NotebookLM as a reference. We will then ask NotebookLM to classify the text based on this reference.

The New Text

食人死膚,令人患惡瘡,多是此蟲。食主之法,當以狸膏敷之,及食狸肉。凡正月食鼠殘,多為鼠瘻,小孔下血者是此病也。

Consuming the skin of dead people, causing malignant sores, it is often this insect. The method to counteract its effects is to apply raccoon fat as an ointment and to eat raccoon meat. Generally, eating leftover food from rats in the first month of the lunar calendar often leads to ‘rat scrofula,’ a condition characterized by small openings from which blood flows.

True,False,False,True,False

Is this the intrinsic power of the LLM or RAG?