浏览代码

feat(extract-service): 数据提取规则系统基础设施搭建

阶段一完成:
- 创建 extract-service Maven 模块
- 启动类 ExtractServiceApplication
- 配置文件 application.properties (端口 8086)
- 健康检查接口 /api/v1/extract/health

数据库表设计(5张核心表):
- extract_projects: 项目表
- extract_source_documents: 来源文档表
- extract_rules: 提取规则表
- extract_results: 提取结果表
- extract_rule_templates: 规则模板表

集成配置:
- 父 pom.xml 添加 extract-service 子模块
- lingyue-starter 添加 extract-service 依赖
- 单体应用配置添加 extract 相关参数

参考设计文档:docs/design/数据提取规则系统-实现阶段规划.md
何文松 1 月之前
父节点
当前提交
854ea0d485

+ 296 - 0
.cursor/plans/模板系统代码设计_119b0abe.plan.md

@@ -0,0 +1,296 @@
+---
+name: 模板系统代码设计
+overview: 基于现有代码结构和前端原型,设计并实现完整的模板系统,包括模板CRUD、占位符解析、数据源绑定、模板渲染、复制/替换模式等核心功能。
+todos:
+  - id: create-dtos
+    content: 创建模板相关的 DTO 类(CreateTemplateRequest、UpdateTemplateRequest、CopyTemplateRequest、ReplaceDataSourceRequest、TemplateRenderRequest、TemplateRenderResult、PlaceholderInfo、PlaceholderMapping)
+    status: pending
+  - id: create-parser
+    content: 创建 PlaceholderParser 占位符解析工具类
+    status: pending
+  - id: create-template-service
+    content: 创建 TemplateService 实现模板 CRUD、复制、占位符管理、数据源替换等业务逻辑
+    status: pending
+  - id: create-render-service
+    content: 创建 TemplateRenderService 实现模板渲染(支持 HTML/Markdown/纯文本输出)
+    status: pending
+  - id: create-controller
+    content: 创建 TemplateController 暴露 RESTful API 接口
+    status: pending
+  - id: update-repository
+    content: 扩展 TemplateRepository 添加必要的查询方法
+    status: pending
+  - id: integration-test
+    content: 编写集成测试验证模板系统功能
+    status: pending
+---
+
+# 模板系统代码设计
+
+## 一、系统架构
+
+```mermaid
+flowchart TB
+    subgraph Frontend[前端]
+        TemplateList[模板列表]
+        TemplateEditor[模板编辑器]
+        TemplatePreview[模板预览]
+    end
+    
+    subgraph Backend[后端 graph-service]
+        Controller[TemplateController]
+        Service[TemplateService]
+        Renderer[TemplateRenderService]
+        Parser[PlaceholderParser]
+        Repository[TemplateRepository]
+    end
+    
+    subgraph Dependencies[依赖服务]
+        DataSourceService[DataSourceService]
+        DocumentService[DocumentService]
+    end
+    
+    TemplateList --> Controller
+    TemplateEditor --> Controller
+    TemplatePreview --> Controller
+    Controller --> Service
+    Controller --> Renderer
+    Service --> Repository
+    Renderer --> Parser
+    Renderer --> DataSourceService
+    Service --> DataSourceService
+```
+
+## 二、核心数据模型
+
+### 2.1 占位符映射结构(PlaceholderMapping)
+
+```json
+{
+  "{{项目名称}}": {
+    "dataSourceId": "ds_001",
+    "type": "text",
+    "defaultValue": "未设置"
+  },
+  "{{投资金额}}": {
+    "dataSourceId": "ds_002", 
+    "type": "text",
+    "format": "currency"
+  },
+  "{{市场数据表}}": {
+    "dataSourceId": "ds_003",
+    "type": "table"
+  },
+  "{{项目图片}}": {
+    "dataSourceId": "ds_004",
+    "type": "image"
+  }
+}
+```
+
+### 2.2 模板内容示例
+
+```text
+# {{项目名称}}可行性研究报告
+
+## 一、项目概述
+本项目预计总投资{{投资金额}},位于{{项目地点}}。
+
+## 二、市场分析
+{{市场数据表}}
+
+## 三、项目效果
+{{项目图片}}
+```
+
+## 三、代码结构设计
+
+### 3.1 新增文件清单
+
+```
+graph-service/src/main/java/com/lingyue/graph/
+├── controller/
+│   └── TemplateController.java         # 模板API接口
+├── service/
+│   ├── TemplateService.java            # 模板业务逻辑
+│   └── TemplateRenderService.java      # 模板渲染服务
+├── dto/
+│   ├── CreateTemplateRequest.java      # 创建模板请求
+│   ├── UpdateTemplateRequest.java      # 更新模板请求
+│   ├── CopyTemplateRequest.java        # 复制模板请求
+│   ├── ReplaceDataSourceRequest.java   # 替换数据源请求
+│   ├── TemplateRenderRequest.java      # 渲染请求
+│   ├── TemplateRenderResult.java       # 渲染结果
+│   ├── PlaceholderInfo.java            # 占位符信息
+│   └── PlaceholderMapping.java         # 占位符映射
+└── util/
+    └── PlaceholderParser.java          # 占位符解析工具
+```
+
+## 四、核心接口设计
+
+### 4.1 TemplateController API
+
+| 方法 | 路径 | 描述 |
+
+|-----|------|-----|
+
+| POST | `/api/templates` | 创建模板 |
+
+| GET | `/api/templates/{id}` | 获取模板详情 |
+
+| GET | `/api/templates` | 查询模板列表 |
+
+| PUT | `/api/templates/{id}` | 更新模板 |
+
+| DELETE | `/api/templates/{id}` | 删除模板 |
+
+| POST | `/api/templates/{id}/copy` | 复制模板 |
+
+| PUT | `/api/templates/{id}/placeholders` | 更新占位符映射 |
+
+| POST | `/api/templates/{id}/replace-datasources` | 批量替换数据源 |
+
+| POST | `/api/templates/{id}/render` | 渲染模板 |
+
+| GET | `/api/templates/{id}/placeholders` | 获取占位符列表 |
+
+### 4.2 TemplateService 核心方法
+
+```java
+public interface TemplateService {
+    // CRUD
+    Template create(String userId, CreateTemplateRequest request);
+    Template getById(String id);
+    List<Template> getByUserId(String userId);
+    Template update(String id, UpdateTemplateRequest request);
+    void delete(String id);
+    
+    // 复制模式
+    Template copy(String sourceTemplateId, CopyTemplateRequest request);
+    
+    // 占位符管理
+    List<PlaceholderInfo> extractPlaceholders(String content);
+    Template updatePlaceholderMapping(String id, Map<String, PlaceholderMapping> mapping);
+    
+    // 批量替换数据源
+    Template replaceDataSources(String id, ReplaceDataSourceRequest request);
+}
+```
+
+### 4.3 TemplateRenderService 核心方法
+
+```java
+public interface TemplateRenderService {
+    // 渲染模板
+    TemplateRenderResult render(String templateId);
+    TemplateRenderResult render(Template template);
+    
+    // 预览渲染(不保存)
+    TemplateRenderResult preview(String templateId);
+    
+    // 渲染单个占位符
+    String renderPlaceholder(String placeholder, PlaceholderMapping mapping);
+}
+```
+
+## 五、核心流程
+
+### 5.1 模板渲染流程
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant Controller as TemplateController
+    participant RenderService as TemplateRenderService
+    participant Parser as PlaceholderParser
+    participant DataSourceService
+    
+    Client->>Controller: POST /templates/{id}/render
+    Controller->>RenderService: render(templateId)
+    RenderService->>RenderService: getTemplate(id)
+    RenderService->>Parser: extractPlaceholders(content)
+    Parser-->>RenderService: List<PlaceholderInfo>
+    
+    loop 每个占位符
+        RenderService->>RenderService: getMapping(placeholder)
+        RenderService->>DataSourceService: getValue(dataSourceId)
+        DataSourceService-->>RenderService: DataSourceValue
+        RenderService->>RenderService: formatValue(value, type)
+    end
+    
+    RenderService->>RenderService: replacePlaceholders()
+    RenderService-->>Controller: TemplateRenderResult
+    Controller-->>Client: 渲染结果
+```
+
+### 5.2 模板复制流程
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant Controller as TemplateController
+    participant Service as TemplateService
+    participant Repository as TemplateRepository
+    
+    Client->>Controller: POST /templates/{id}/copy
+    Controller->>Service: copy(sourceId, request)
+    Service->>Repository: selectById(sourceId)
+    Repository-->>Service: 源模板
+    Service->>Service: 复制模板内容和占位符映射
+    Service->>Service: 设置sourceTemplateId
+    Service->>Repository: insert(newTemplate)
+    Repository-->>Service: 新模板
+    Service-->>Controller: Template
+    Controller-->>Client: 新模板信息
+```
+
+## 六、关键实现细节
+
+### 6.1 占位符解析(PlaceholderParser)
+
+- 支持格式:`{{占位符名称}}`
+- 正则表达式:`\{\{([^}]+)\}\}`
+- 提取信息:占位符名称、位置(行号、列号)、上下文
+
+### 6.2 数据源替换模式
+
+支持两种替换方式:
+
+1. **ID映射替换**:`{"oldDataSourceId": "newDataSourceId"}`
+2. **名称匹配替换**:按数据源名称自动匹配
+
+### 6.3 渲染结果结构
+
+```java
+public class TemplateRenderResult {
+    private String templateId;
+    private String renderedContent;      // HTML格式
+    private String renderedMarkdown;     // Markdown格式
+    private String renderedPlainText;    // 纯文本格式
+    private List<RenderDetail> details;  // 每个占位符的渲染详情
+    private List<String> warnings;       // 警告信息(如数据源缺失)
+    private Map<String, Object> metadata;
+}
+```
+
+### 6.4 占位符类型处理
+
+| 类型 | 处理方式 |
+
+|-----|---------|
+
+| text | 直接替换为文本值 |
+
+| image | 生成 `<img>` 标签或 Markdown 图片语法 |
+
+| table | 生成 HTML 表格或 Markdown 表格 |
+
+| mixed | 根据实际数据类型自动处理 |
+
+## 七、依赖现有代码
+
+- [DataSourceService](backend/graph-service/src/main/java/com/lingyue/graph/service/DataSourceService.java):获取数据源值
+- [DataSourceValue](backend/graph-service/src/main/java/com/lingyue/graph/dto/DataSourceValue.java):数据源值结构
+- [Template](backend/graph-service/src/main/java/com/lingyue/graph/entity/Template.java):模板实体
+- [TemplateRepository](backend/graph-service/src/main/java/com/lingyue/graph/repository/TemplateRepository.java):模板数据访问

+ 123 - 0
backend/extract-service/pom.xml

@@ -0,0 +1,123 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<project xmlns="http://maven.apache.org/POM/4.0.0"
+         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
+         http://maven.apache.org/xsd/maven-4.0.0.xsd">
+    <modelVersion>4.0.0</modelVersion>
+
+    <parent>
+        <groupId>com.lingyue</groupId>
+        <artifactId>lingyue-zhibao</artifactId>
+        <version>2.0.0</version>
+    </parent>
+
+    <artifactId>extract-service</artifactId>
+    <packaging>jar</packaging>
+
+    <name>Extract Service</name>
+    <description>数据提取规则服务 - 项目管理、规则配置、内容提取</description>
+
+    <dependencies>
+        <!-- Spring Boot Web -->
+        <dependency>
+            <groupId>org.springframework.boot</groupId>
+            <artifactId>spring-boot-starter-web</artifactId>
+        </dependency>
+        
+        <!-- Spring Boot Validation -->
+        <dependency>
+            <groupId>org.springframework.boot</groupId>
+            <artifactId>spring-boot-starter-validation</artifactId>
+        </dependency>
+        
+        <!-- MyBatis Plus -->
+        <dependency>
+            <groupId>com.baomidou</groupId>
+            <artifactId>mybatis-plus-boot-starter</artifactId>
+        </dependency>
+        
+        <!-- PostgreSQL Driver -->
+        <dependency>
+            <groupId>org.postgresql</groupId>
+            <artifactId>postgresql</artifactId>
+        </dependency>
+        
+        <!-- Spring Boot AMQP (RabbitMQ) - 用于异步任务 -->
+        <dependency>
+            <groupId>org.springframework.boot</groupId>
+            <artifactId>spring-boot-starter-amqp</artifactId>
+        </dependency>
+        
+        <!-- Spring Boot Data Redis - 用于进度跟踪 -->
+        <dependency>
+            <groupId>org.springframework.boot</groupId>
+            <artifactId>spring-boot-starter-data-redis</artifactId>
+        </dependency>
+        
+        <!-- Common Module -->
+        <dependency>
+            <groupId>com.lingyue</groupId>
+            <artifactId>common</artifactId>
+        </dependency>
+        
+        <!-- Document Service (单体应用模式直接依赖) -->
+        <dependency>
+            <groupId>com.lingyue</groupId>
+            <artifactId>document-service</artifactId>
+            <version>${project.version}</version>
+            <optional>true</optional>
+        </dependency>
+        
+        <!-- Parse Service (单体应用模式直接依赖) -->
+        <dependency>
+            <groupId>com.lingyue</groupId>
+            <artifactId>parse-service</artifactId>
+            <version>${project.version}</version>
+            <optional>true</optional>
+        </dependency>
+        
+        <!-- AI Service (单体应用模式直接依赖) -->
+        <dependency>
+            <groupId>com.lingyue</groupId>
+            <artifactId>ai-service</artifactId>
+            <version>${project.version}</version>
+            <optional>true</optional>
+        </dependency>
+        
+        <!-- Graph Service (单体应用模式直接依赖,用于数据源注册) -->
+        <dependency>
+            <groupId>com.lingyue</groupId>
+            <artifactId>graph-service</artifactId>
+            <version>${project.version}</version>
+            <optional>true</optional>
+        </dependency>
+        
+        <!-- Lombok -->
+        <dependency>
+            <groupId>org.projectlombok</groupId>
+            <artifactId>lombok</artifactId>
+            <optional>true</optional>
+        </dependency>
+        
+        <!-- SpringDoc OpenAPI -->
+        <dependency>
+            <groupId>org.springdoc</groupId>
+            <artifactId>springdoc-openapi-starter-webmvc-ui</artifactId>
+        </dependency>
+        
+        <!-- Hutool 工具类 -->
+        <dependency>
+            <groupId>cn.hutool</groupId>
+            <artifactId>hutool-all</artifactId>
+        </dependency>
+    </dependencies>
+
+    <build>
+        <plugins>
+            <plugin>
+                <groupId>org.springframework.boot</groupId>
+                <artifactId>spring-boot-maven-plugin</artifactId>
+            </plugin>
+        </plugins>
+    </build>
+</project>

+ 24 - 0
backend/extract-service/src/main/java/com/lingyue/extract/ExtractServiceApplication.java

@@ -0,0 +1,24 @@
+package com.lingyue.extract;
+
+import org.mybatis.spring.annotation.MapperScan;
+import org.springframework.boot.SpringApplication;
+import org.springframework.boot.autoconfigure.SpringBootApplication;
+import org.springframework.scheduling.annotation.EnableAsync;
+
+/**
+ * 数据提取规则服务启动类
+ * 
+ * 提供项目管理、规则配置、内容提取等功能
+ * 
+ * @author lingyue
+ * @since 2026-01-22
+ */
+@SpringBootApplication(scanBasePackages = {"com.lingyue"})
+@MapperScan(basePackages = {"com.lingyue.extract.repository"})
+@EnableAsync
+public class ExtractServiceApplication {
+    
+    public static void main(String[] args) {
+        SpringApplication.run(ExtractServiceApplication.class, args);
+    }
+}

+ 64 - 0
backend/extract-service/src/main/java/com/lingyue/extract/controller/HealthController.java

@@ -0,0 +1,64 @@
+package com.lingyue.extract.controller;
+
+import com.lingyue.common.domain.AjaxResult;
+import io.swagger.v3.oas.annotations.Operation;
+import io.swagger.v3.oas.annotations.tags.Tag;
+import lombok.extern.slf4j.Slf4j;
+import org.springframework.web.bind.annotation.GetMapping;
+import org.springframework.web.bind.annotation.RequestMapping;
+import org.springframework.web.bind.annotation.RestController;
+
+import java.time.LocalDateTime;
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * 健康检查控制器
+ * 
+ * @author lingyue
+ * @since 2026-01-22
+ */
+@Slf4j
+@RestController
+@RequestMapping("/api/v1/extract")
+@Tag(name = "数据提取服务", description = "数据提取规则系统 API")
+public class HealthController {
+    
+    /**
+     * 健康检查
+     */
+    @GetMapping("/health")
+    @Operation(summary = "健康检查", description = "检查数据提取服务是否正常运行")
+    public AjaxResult<?> health() {
+        Map<String, Object> info = new HashMap<>();
+        info.put("service", "extract-service");
+        info.put("status", "UP");
+        info.put("timestamp", LocalDateTime.now().toString());
+        info.put("version", "1.0.0");
+        
+        return AjaxResult.success("服务正常", info);
+    }
+    
+    /**
+     * 服务信息
+     */
+    @GetMapping("/info")
+    @Operation(summary = "服务信息", description = "获取数据提取服务的基本信息")
+    public AjaxResult<?> info() {
+        Map<String, Object> info = new HashMap<>();
+        info.put("name", "Extract Service");
+        info.put("description", "数据提取规则服务 - 项目管理、规则配置、内容提取");
+        info.put("version", "1.0.0");
+        info.put("features", new String[]{
+            "项目管理 (Project)",
+            "来源文档管理 (SourceDocument)",
+            "提取规则配置 (ExtractRule)",
+            "内容定位 (ContentLocator)",
+            "AI 提取 (AIExtract)",
+            "结果管理 (ExtractResult)",
+            "规则模板 (RuleTemplate)"
+        });
+        
+        return AjaxResult.success(info);
+    }
+}

+ 46 - 0
backend/extract-service/src/main/resources/application.properties

@@ -0,0 +1,46 @@
+# ===================================
+# 数据提取规则服务配置
+# Extract Service Configuration
+# ===================================
+
+# 服务配置
+server.port=8086
+spring.application.name=extract-service
+
+# 数据库配置 (继承自 lingyue-starter 或独立配置)
+spring.datasource.url=jdbc:postgresql://localhost:5432/lingyue_zhibao
+spring.datasource.username=lingyue
+spring.datasource.password=lingyue123
+spring.datasource.driver-class-name=org.postgresql.Driver
+
+# MyBatis Plus 配置
+mybatis-plus.configuration.log-impl=org.apache.ibatis.logging.stdout.StdOutImpl
+mybatis-plus.configuration.map-underscore-to-camel-case=true
+mybatis-plus.global-config.db-config.id-type=ASSIGN_UUID
+mybatis-plus.type-handlers-package=com.lingyue.common.mybatis
+
+# Redis 配置 (用于任务进度跟踪)
+spring.data.redis.host=localhost
+spring.data.redis.port=6379
+
+# RabbitMQ 配置 (用于异步任务)
+spring.rabbitmq.host=localhost
+spring.rabbitmq.port=5672
+spring.rabbitmq.username=guest
+spring.rabbitmq.password=guest
+
+# 提取服务配置
+extract.task.timeout=600000
+extract.ai.max-retries=3
+extract.ai.retry-delay=1000
+extract.cache.enabled=true
+extract.cache.ttl=3600
+
+# SpringDoc 配置
+springdoc.api-docs.enabled=true
+springdoc.swagger-ui.enabled=true
+springdoc.swagger-ui.path=/swagger-ui.html
+
+# 日志配置
+logging.level.com.lingyue.extract=DEBUG
+logging.level.org.mybatis=DEBUG

+ 6 - 0
backend/lingyue-starter/pom.xml

@@ -85,6 +85,12 @@
             <artifactId>notification-service</artifactId>
             <version>${project.version}</version>
         </dependency>
+        
+        <dependency>
+            <groupId>com.lingyue</groupId>
+            <artifactId>extract-service</artifactId>
+            <version>${project.version}</version>
+        </dependency>
 
         <!-- AOP -->
         <dependency>

+ 9 - 0
backend/lingyue-starter/src/main/resources/application.properties

@@ -205,6 +205,15 @@ spring.rabbitmq.port=${RABBITMQ_PORT:5672}
 spring.rabbitmq.username=${RABBITMQ_USERNAME:admin}
 spring.rabbitmq.password=${RABBITMQ_PASSWORD:admin123}
 
+# ============================================
+# Extract Service 数据提取配置
+# ============================================
+extract.task.timeout=600000
+extract.ai.max-retries=3
+extract.ai.retry-delay=1000
+extract.cache.enabled=true
+extract.cache.ttl=3600
+
 # ============================================
 # Neo4j 图数据库配置
 # ============================================

+ 1 - 0
backend/pom.xml

@@ -22,6 +22,7 @@
         <module>ai-service</module>
         <module>graph-service</module>
         <module>notification-service</module>
+        <module>extract-service</module>
         <module>lingyue-starter</module>
     </modules>
 

+ 200 - 0
database/migrations/V2026_01_22_02__create_extract_tables.sql

@@ -0,0 +1,200 @@
+-- ============================================
+-- 数据提取规则系统核心表
+-- extract-service 模块
+-- ============================================
+
+-- ============================================
+-- 1. 项目表 (projects)
+-- ============================================
+CREATE TABLE IF NOT EXISTS extract_projects (
+    id VARCHAR(32) PRIMARY KEY,
+    user_id VARCHAR(32) NOT NULL,
+    name VARCHAR(255) NOT NULL,
+    description TEXT,
+    status VARCHAR(32) DEFAULT 'draft',  -- draft/extracting/completed/archived
+    config JSONB DEFAULT '{}',
+    create_by VARCHAR(36),
+    create_by_name VARCHAR(100),
+    create_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
+    update_by VARCHAR(36),
+    update_by_name VARCHAR(100),
+    update_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
+);
+
+CREATE INDEX IF NOT EXISTS idx_extract_projects_user_id ON extract_projects(user_id);
+CREATE INDEX IF NOT EXISTS idx_extract_projects_status ON extract_projects(status);
+
+COMMENT ON TABLE extract_projects IS '数据提取项目';
+COMMENT ON COLUMN extract_projects.status IS '状态: draft-草稿, extracting-提取中, completed-已完成, archived-已归档';
+COMMENT ON COLUMN extract_projects.config IS '项目配置: outputFormat, autoExtract, notifyOnComplete, aiModel';
+
+-- ============================================
+-- 2. 来源文档表 (source_documents)
+-- ============================================
+CREATE TABLE IF NOT EXISTS extract_source_documents (
+    id VARCHAR(32) PRIMARY KEY,
+    project_id VARCHAR(32) NOT NULL REFERENCES extract_projects(id) ON DELETE CASCADE,
+    document_id VARCHAR(36) NOT NULL,  -- 关联 documents 表
+    alias VARCHAR(128) NOT NULL,       -- 文档别名,如"可研批复"
+    doc_type VARCHAR(32) NOT NULL,     -- pdf/docx/xlsx
+    display_order INT DEFAULT 0,
+    metadata JSONB DEFAULT '{}',
+    create_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
+    
+    CONSTRAINT uq_source_doc_alias UNIQUE (project_id, alias)
+);
+
+CREATE INDEX IF NOT EXISTS idx_source_docs_project_id ON extract_source_documents(project_id);
+CREATE INDEX IF NOT EXISTS idx_source_docs_document_id ON extract_source_documents(document_id);
+
+COMMENT ON TABLE extract_source_documents IS '项目来源文档';
+COMMENT ON COLUMN extract_source_documents.alias IS '文档别名,如"可研批复"、"可行性研究报告"';
+COMMENT ON COLUMN extract_source_documents.metadata IS '元数据: fileName, fileSize, pageCount, parseStatus';
+
+-- ============================================
+-- 3. 提取规则表 (extract_rules)
+-- ============================================
+CREATE TABLE IF NOT EXISTS extract_rules (
+    id VARCHAR(32) PRIMARY KEY,
+    project_id VARCHAR(32) NOT NULL REFERENCES extract_projects(id) ON DELETE CASCADE,
+    source_doc_id VARCHAR(32),  -- 可为空,表示引用/固定/手动类型
+    
+    -- 目标字段
+    target_field_key VARCHAR(128) NOT NULL,   -- 字段Key(程序用)
+    target_field_name VARCHAR(255) NOT NULL,  -- 字段名称(显示用)
+    target_field_group VARCHAR(128),          -- 字段分组
+    rule_index INT NOT NULL DEFAULT 0,        -- 规则顺序
+    
+    -- 来源配置
+    source_type VARCHAR(32) NOT NULL,  -- document/self_reference/fixed/manual
+    source_config JSONB NOT NULL,      -- 来源配置详情
+    
+    -- 提取配置
+    extract_type VARCHAR(32) NOT NULL, -- direct/ai_extract/ai_summarize/ocr
+    extract_config JSONB,              -- 提取配置详情
+    
+    -- 结果
+    status VARCHAR(32) DEFAULT 'pending',  -- pending/extracting/extracted/confirmed/error
+    extracted_value TEXT,
+    value_type VARCHAR(32) DEFAULT 'text', -- text/table/image/list
+    error_message TEXT,
+    
+    -- 元数据
+    metadata JSONB DEFAULT '{}',
+    create_by VARCHAR(36),
+    create_by_name VARCHAR(100),
+    create_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
+    update_by VARCHAR(36),
+    update_by_name VARCHAR(100),
+    update_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
+    
+    CONSTRAINT uq_rule_field_key UNIQUE (project_id, target_field_key)
+);
+
+CREATE INDEX IF NOT EXISTS idx_extract_rules_project_id ON extract_rules(project_id);
+CREATE INDEX IF NOT EXISTS idx_extract_rules_source_doc_id ON extract_rules(source_doc_id);
+CREATE INDEX IF NOT EXISTS idx_extract_rules_status ON extract_rules(status);
+CREATE INDEX IF NOT EXISTS idx_extract_rules_rule_index ON extract_rules(project_id, rule_index);
+
+COMMENT ON TABLE extract_rules IS '数据提取规则';
+COMMENT ON COLUMN extract_rules.source_type IS '来源类型: document-文档, self_reference-引用已提取字段, fixed-固定值, manual-手动输入';
+COMMENT ON COLUMN extract_rules.extract_type IS '提取类型: direct-直接提取, ai_extract-AI字段提取, ai_summarize-AI总结, ocr-OCR识别';
+COMMENT ON COLUMN extract_rules.source_config IS '来源配置: location, paragraphKeyword, referenceFieldKeys等';
+COMMENT ON COLUMN extract_rules.extract_config IS '提取配置: targetDescription, expectedFormat, summarizePrompt等';
+
+-- ============================================
+-- 4. 提取结果表 (extract_results)
+-- ============================================
+CREATE TABLE IF NOT EXISTS extract_results (
+    id VARCHAR(32) PRIMARY KEY,
+    rule_id VARCHAR(32) NOT NULL REFERENCES extract_rules(id) ON DELETE CASCADE,
+    project_id VARCHAR(32) NOT NULL REFERENCES extract_projects(id) ON DELETE CASCADE,
+    
+    -- 提取结果
+    extracted_value TEXT NOT NULL,
+    value_type VARCHAR(32) DEFAULT 'text',
+    
+    -- 来源追溯
+    source_content TEXT,        -- 来源原文内容
+    source_location JSONB,      -- 来源位置信息
+    
+    -- 质量评估
+    confidence DECIMAL(5,4),    -- AI提取置信度 0-1
+    
+    -- 状态
+    status VARCHAR(32) DEFAULT 'extracted',  -- extracted/confirmed/rejected/modified
+    
+    -- 人工处理
+    modified_value TEXT,        -- 人工修正后的值
+    confirmed_at TIMESTAMP,
+    confirmed_by VARCHAR(32),
+    reject_reason TEXT,         -- 拒绝原因
+    
+    -- 元数据
+    metadata JSONB DEFAULT '{}',
+    create_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
+);
+
+CREATE INDEX IF NOT EXISTS idx_extract_results_rule_id ON extract_results(rule_id);
+CREATE INDEX IF NOT EXISTS idx_extract_results_project_id ON extract_results(project_id);
+CREATE INDEX IF NOT EXISTS idx_extract_results_status ON extract_results(status);
+
+COMMENT ON TABLE extract_results IS '提取结果历史';
+COMMENT ON COLUMN extract_results.source_location IS '来源位置: documentId, documentAlias, locationType, pageStart, pageEnd, elementIds, chapterPath';
+COMMENT ON COLUMN extract_results.status IS '状态: extracted-已提取, confirmed-已确认, rejected-已拒绝, modified-已修正';
+
+-- ============================================
+-- 5. 规则模板表 (rule_templates)
+-- ============================================
+CREATE TABLE IF NOT EXISTS extract_rule_templates (
+    id VARCHAR(32) PRIMARY KEY,
+    user_id VARCHAR(32) NOT NULL,
+    name VARCHAR(255) NOT NULL,
+    description TEXT,
+    
+    -- 模板内容
+    rules_snapshot JSONB NOT NULL,     -- 规则配置快照
+    doc_type_pattern JSONB,            -- 适用的文档类型模式
+    
+    -- 可见性
+    is_public BOOLEAN DEFAULT FALSE,
+    
+    -- 统计
+    use_count INT DEFAULT 0,
+    
+    -- 元数据
+    metadata JSONB DEFAULT '{}',
+    create_by VARCHAR(36),
+    create_by_name VARCHAR(100),
+    create_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
+    update_by VARCHAR(36),
+    update_by_name VARCHAR(100),
+    update_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
+);
+
+CREATE INDEX IF NOT EXISTS idx_rule_templates_user_id ON extract_rule_templates(user_id);
+CREATE INDEX IF NOT EXISTS idx_rule_templates_is_public ON extract_rule_templates(is_public);
+
+COMMENT ON TABLE extract_rule_templates IS '规则模板';
+COMMENT ON COLUMN extract_rule_templates.rules_snapshot IS '规则配置快照(不含项目特定信息)';
+COMMENT ON COLUMN extract_rule_templates.doc_type_pattern IS '适用的文档类型模式,用于自动匹配';
+
+-- ============================================
+-- 更新时间触发器
+-- ============================================
+DROP TRIGGER IF EXISTS update_extract_projects_update_time ON extract_projects;
+CREATE TRIGGER update_extract_projects_update_time BEFORE UPDATE ON extract_projects
+    FOR EACH ROW EXECUTE FUNCTION update_update_time_column();
+
+DROP TRIGGER IF EXISTS update_extract_rules_update_time ON extract_rules;
+CREATE TRIGGER update_extract_rules_update_time BEFORE UPDATE ON extract_rules
+    FOR EACH ROW EXECUTE FUNCTION update_update_time_column();
+
+DROP TRIGGER IF EXISTS update_rule_templates_update_time ON extract_rule_templates;
+CREATE TRIGGER update_rule_templates_update_time BEFORE UPDATE ON extract_rule_templates
+    FOR EACH ROW EXECUTE FUNCTION update_update_time_column();
+
+-- ============================================
+-- 显示创建结果
+-- ============================================
+SELECT 'Extract Service 数据表创建成功' AS result;

+ 20 - 0
进度报告.md

@@ -21,6 +21,26 @@
 - 📦 **数据源管理** → ✅ 已完成(CRUD + 取值 + 聚合)
 - ⏱️ **任务中心** → ✅ 已完成(多阶段进度跟踪)
 
+### 新增功能(2026-01-22 晚)✅ 数据提取规则系统 - 基础设施
+
+- ✅ **extract-service 模块创建**
+  - Maven 模块框架搭建完成
+  - 启动类 `ExtractServiceApplication.java`
+  - 配置文件 `application.properties`(端口 8086)
+  - 健康检查接口 `/api/v1/extract/health`
+  
+- ✅ **数据库表设计**(5张核心表)
+  - `extract_projects` - 项目表
+  - `extract_source_documents` - 来源文档表
+  - `extract_rules` - 提取规则表
+  - `extract_results` - 提取结果表
+  - `extract_rule_templates` - 规则模板表
+  
+- ✅ **集成配置**
+  - 父 pom.xml 添加子模块
+  - lingyue-starter 添加依赖
+  - 单体应用配置更新
+
 ### 新增功能(2026-01-22 下午)✅ 补充缺失接口
 
 - ✅ **认证服务接口完善**