构建浏览器端海量 KV 存储的响应式虚拟列表

前端工程

文章字数: 3.4k

阅读时长: 15 分

当浏览器应用需要处理超过10万条结构化日志记录，并提供流畅的滚动和搜索体验时，将所有数据一次性加载到内存中的JavaScript数组，并直接渲染到DOM中，是一种不可行的方案。内存占用会瞬间飙升，UI线程会被长时间阻塞，最终导致页面崩溃。这个挑战迫使我们必须重新思考客户端的数据存储与UI渲染架构。

问题的核心分为两部分：一是如何高效地在浏览器端持久化并查询大量数据；二是如何在不牺牲响应速度的前提下，将这些数据呈现给用户。

技术选型决策

面对数据存储，IndexedDB是标准答案，但其API相对繁琐，事务模型也增加了心智负担。我们需要一个API更简洁、性能更可预测的键值存储方案。LevelDB，作为Google开源的高性能KV数据库，通过level.js库可以在浏览器环境（内部使用IndexedDB作为后端）中使用，它提供了更符合直觉的put/get/createReadStream接口。这对于需要频繁进行范围查询的日志场景非常合适。

对于UI渲染，状态管理和虚拟化是关键。MobX以其细粒度的响应式系统，能确保只有真正需要更新的组件才会重新渲染，避免了不必要的性能开销。它与React的结合非常自然。而为了处理海量数据的渲染，虚拟列表（Virtual List）是唯一出路，它只渲染可视区域内的DOM节点。Chakra UI则作为UI组件库，提供高质量、可访问的基础组件，让我们能专注于核心逻辑而非样式细节。

整个架构的设想是：

使用LevelDB作为底层数据引擎，负责数据的持久化存储和高效检索。
构建一个数据访问层（Data Access Layer），它封装LevelDB的操作，并向上层暴露简洁的异步接口。
创建一个响应式视图模型（Reactive View Model），它使用MobX来管理UI状态，例如当前可见的数据窗口、总数据量、加载状态等。这个模型负责与数据访问层交互。
React组件层使用虚拟列表技术，监听滚动事件，并命令视图模型加载相应的数据窗口。组件本身则订阅MobX的状态，实现高效的自动更新。

graph TD
    subgraph Browser UI
        A[用户滚动操作] --> B(Virtualized React Component);
        B -- 触发数据请求(offset, limit) --> C{MobX Reactive ViewModel};
        C -- 更新 observable state --> B;
    end

    subgraph Data Layer
        C -- 调用 `fetchRange(offset, limit)` --> D[Data Access Layer];
        D -- 执行 `db.iterator()` --> E(LevelDB via level.js);
        E -- 返回数据流 --> D;
        D -- 返回数据 promise --> C;
    end

    E --> F[Browser IndexedDB];

    style B fill:#b9f,stroke:#333,stroke-width:2px;
    style C fill:#f9f,stroke:#333,stroke-width:2px;
    style E fill:#ccf,stroke:#333,stroke-width:2px;

步骤化实现：从数据到视图

1. 封装健壮的数据访问层

首先，我们需要一个可靠的LevelDB封装。这个类不仅要处理数据库的打开和关闭，还要提供带错误处理的、类型安全的原子操作。在真实项目中，日志记录通常按时间排序，因此我们使用时间戳的某种形式（如Date.now() + 序列号）作为key，以保证其有序性。

// src/services/LogStore.ts

import { Level } from 'level';
import { AbstractIterator } from 'abstract-level';

export interface LogEntry {
  id: string; // Key in LevelDB, e.g., 'log-1672531200000-0001'
  timestamp: number;
  level: 'info' | 'warn' | 'error';
  message: string;
  payload?: Record<string, any>;
}

export class LogStore {
  private db: Level<string, LogEntry>;
  private static instance: LogStore;

  // 使用单例模式确保全局只有一个数据库连接实例
  private constructor(dbName: string = 'app-log-db') {
    this.db = new Level(dbName, { valueEncoding: 'json' });
    this.logInitialization();
  }

  public static getInstance(): LogStore {
    if (!LogStore.instance) {
      LogStore.instance = new LogStore();
    }
    return LogStore.instance;
  }
  
  private logInitialization() {
    this.db.open(err => {
      if (err) {
        console.error('Failed to open LevelDB:', err);
      } else {
        console.log('LevelDB opened successfully.');
      }
    });
  }

  public async addEntry(entry: Omit<LogEntry, 'id'>): Promise<string> {
    const id = `log-${entry.timestamp}-${String(Math.random()).slice(2, 6)}`;
    const logEntry: LogEntry = { ...entry, id };
    try {
      await this.db.put(id, logEntry);
      return id;
    } catch (error) {
      console.error(`Failed to add log entry ${id}:`, error);
      throw new Error('Database write operation failed.');
    }
  }

  public async getEntry(id: string): Promise<LogEntry | null> {
    try {
      return await this.db.get(id);
    } catch (error: any) {
      // 'NotFoundError' is the expected error when a key is not found
      if (error.code === 'LEVEL_NOT_FOUND') {
        return null;
      }
      console.error(`Failed to get log entry ${id}:`, error);
      throw new Error('Database read operation failed.');
    }
  }

  public async getTotalCount(): Promise<number> {
    let count = 0;
    try {
      // 这里的坑在于：没有直接的 .count() API。
      // 对于大数据量，全量遍历非常慢。
      // 在生产环境中，我们会维护一个单独的key来存储总数，例如 'meta-log-count'
      // 每次写入或删除时，在一个事务中更新这个计数器。
      // 此处为了演示简化，我们依然采用遍历。
      for await (const _ of this.db.keys()) {
        count++;
      }
      return count;
    } catch (error) {
      console.error('Failed to count entries:', error);
      return 0;
    }
  }

  public async getEntriesRange(options: {
    offset: number;
    limit: number;
  }): Promise<LogEntry[]> {
    const { offset, limit } = options;
    const results: LogEntry[] = [];
    
    // LevelDB的迭代器是其性能核心
    const iterator = this.db.iterator({ limit: limit, gte: `log-${0}` }); // gte ensures we start from the beginning of logs

    try {
      let currentIndex = 0;
      for await (const [key, value] of iterator) {
        if (currentIndex >= offset) {
          results.push(value);
        }
        currentIndex++;
        if (results.length >= limit) {
          break; // 达到数量限制，立即停止迭代
        }
      }
      // 手动关闭迭代器，释放资源
      await iterator.close();
      return results;
    } catch (error) {
      console.error('Failed during range iteration:', error);
      // 确保即使出错也要尝试关闭迭代器
      if (!iterator.ended) {
        await iterator.close().catch(closeErr => console.error('Failed to close iterator after error:', closeErr));
      }
      throw new Error('Database range read operation failed.');
    }
  }
  
  // 用于填充测试数据
  public async seed(count: number): Promise<void> {
    console.log(`Seeding ${count} log entries...`);
    const batch = this.db.batch();
    const now = Date.now();
    for (let i = 0; i < count; i++) {
      const timestamp = now - i * 1000;
      const id = `log-${timestamp}-${String(i).padStart(5, '0')}`;
      const level = ['info', 'warn', 'error'][i % 3] as 'info' | 'warn' | 'error';
      batch.put(id, {
        id,
        timestamp,
        level,
        message: `This is a seeded log message number ${i + 1}.`,
        payload: { userId: `user-${i % 100}` }
      });
    }
    try {
      await batch.write();
      console.log('Seeding complete.');
    } catch (error) {
      console.error('Failed to seed database:', error);
    }
  }
}

这段代码的关键在于getEntriesRange的实现。我们没有一次性读取所有key再进行slice，而是直接利用level.js的迭代器（iterator）。迭代器是惰性的，它只在需要时从底层存储中拉取数据，这极大地降低了内存消耗。同时，我们通过limit和手动break来精确控制读取的数据量。

2. 搭建MobX响应式视图模型

视图模型是连接UI和数据层的桥梁。它不关心DOM，只管理状态。当状态改变时，MobX会自动通知UI更新。

// src/viewmodels/LogViewerViewModel.ts
import { makeAutoObservable, runInAction } from 'mobx';
import { LogStore, LogEntry } from '../services/LogStore';

const PAGE_SIZE = 50; // 每次从数据库加载的数据条数

export class LogViewerViewModel {
  // --- Observable State ---
  public logs: LogEntry[] = [];
  public totalCount: number = 0;
  public isLoading: boolean = true;
  public error: string | null = null;
  
  // --- Private State ---
  private logStore: LogStore;
  private isInitialized = false;

  constructor() {
    this.logStore = LogStore.getInstance();
    makeAutoObservable(this);
    this.initialize();
  }

  private async initialize() {
    if (this.isInitialized) return;

    runInAction(() => {
      this.isLoading = true;
      this.error = null;
    });

    try {
      const count = await this.logStore.getTotalCount();
      // 在真实项目中，如果count为0，可以触发一个引导用户生成数据的流程
      if (count === 0) {
        await this.logStore.seed(100000); // 填充10万条数据
      }
      const initialCount = await this.logStore.getTotalCount();
      const initialLogs = await this.fetchLogs(0, PAGE_SIZE);

      runInAction(() => {
        this.totalCount = initialCount;
        this.logs = initialLogs;
        this.isInitialized = true;
      });
    } catch (e: any) {
      runInAction(() => {
        this.error = e.message || 'Failed to initialize log viewer.';
      });
    } finally {
      runInAction(() => {
        this.isLoading = false;
      });
    }
  }

  // 这是核心方法，由虚拟列表的滚动事件触发
  public async ensureData(startIndex: number, stopIndex: number) {
    // 检查所需范围的数据是否已在内存中
    // 这是一个简化的检查，生产级实现会更复杂，需要处理空洞和重叠
    if (startIndex < this.logs.length && this.logs[startIndex]) {
      // 假设数据是连续的，如果开头存在，则认为数据已加载
      return;
    }
    
    // 避免重复加载
    if (this.isLoading) return;

    runInAction(() => {
      this.isLoading = true;
    });

    try {
      // 计算需要加载的页码和数量
      const limit = stopIndex - startIndex + 1;
      const fetchedLogs = await this.fetchLogs(startIndex, limit);

      runInAction(() => {
        // 合并数据。注意：这里的合并逻辑很关键
        // 必须创建一个新数组以触发MobX的更新，同时要正确地放置数据
        const newLogs = [...this.logs];
        fetchedLogs.forEach((log, index) => {
          newLogs[startIndex + index] = log;
        });
        this.logs = newLogs;
      });
    } catch (e: any) {
      runInAction(() => {
        this.error = e.message || 'Failed to fetch more logs.';
        // 可以考虑实现重试逻辑
      });
    } finally {
      runInAction(() => {
        this.isLoading = false;
      });
    }
  }
  
  private async fetchLogs(offset: number, limit: number): Promise<LogEntry[]> {
    console.log(`Fetching logs from offset: ${offset}, limit: ${limit}`);
    return this.logStore.getEntriesRange({ offset, limit });
  }
}

LogViewerViewModel中的ensureData方法是关键。虚拟列表组件在滚动时会告诉我们它需要渲染startIndex到stopIndex范围内的数据。ensureData会检查这部分数据是否已经在this.logs数组中。如果不在，它会向LogStore请求数据，然后更新this.logs。所有状态变更都通过runInAction包裹，以确保MobX能正确地批量处理更新。

3. 集成虚拟列表与UI组件

现在，我们将所有部分组合在一起。我们使用react-window这个轻量级的虚拟化库。

// src/components/LogViewer.tsx
import React, { FC, useMemo } from 'react';
import { observer } from 'mobx-react-lite';
import { FixedSizeList as List } from 'react-window';
import AutoSizer from 'react-virtualized-auto-sizer';
import {
  Box,
  Spinner,
  Text,
  Center,
  Code,
  Alert,
  AlertIcon,
  VStack,
  HStack,
} from '@chakra-ui/react';
import { LogViewerViewModel } from '../viewmodels/LogViewerViewModel';
import { LogEntry } from '../services/LogStore';

// 单条日志的渲染组件
const LogRow: FC<{ index: number; style: React.CSSProperties; data: { logs: LogEntry[], isItemLoaded: (index: number) => boolean } }> = observer(
  ({ index, style, data }) => {
    const { logs, isItemLoaded } = data;

    if (!isItemLoaded(index)) {
      return (
        <HStack style={style} p={2} spacing={4} alignItems="center">
          <Spinner size="sm" />
          <Text fontSize="sm" color="gray.500">Loading...</Text>
        </HStack>
      );
    }
    
    const log = logs[index];
    if (!log) return null; // Safety check

    const getLevelColor = (level: string) => {
      if (level === 'error') return 'red.500';
      if (level === 'warn') return 'yellow.500';
      return 'blue.500';
    };

    return (
      <HStack style={style} p={2} spacing={4} alignItems="flex-start" borderBottom="1px solid" borderColor="gray.200">
        <Code colorScheme="gray" fontSize="xs" minW="150px">
          {new Date(log.timestamp).toISOString()}
        </Code>
        <Code colorScheme={getLevelColor(log.level).split('.')[0]} fontWeight="bold">
          [{log.level.toUpperCase()}]
        </Code>
        <Text fontSize="sm" fontFamily="monospace" whiteSpace="pre-wrap" flex="1">
          {log.message}
        </Text>
      </HStack>
    );
  }
);

export const LogViewer: FC = observer(() => {
  // 在组件内部实例化ViewModel，其生命周期与组件绑定
  const viewModel = useMemo(() => new LogViewerViewModel(), []);

  if (viewModel.error) {
    return (
      <Center height="100%">
        <Alert status="error">
          <AlertIcon />
          {viewModel.error}
        </Alert>
      </Center>
    );
  }
  
  const isItemLoaded = (index: number): boolean => !!viewModel.logs[index];

  const loadMoreItems = (startIndex: number, stopIndex: number) => {
    // react-window的回调，触发数据加载
    return viewModel.ensureData(startIndex, stopIndex);
  };
  
  return (
    <VStack height="100vh" width="100%" spacing={0} align="stretch">
      <Box p={4} borderBottom="1px solid" borderColor="gray.300">
        <Text fontSize="lg" fontWeight="bold">Real-time Log Viewer</Text>
        <Text fontSize="sm" color="gray.600">Total Entries: {viewModel.totalCount.toLocaleString()}</Text>
      </Box>
      <Box flex="1">
        <AutoSizer>
          {({ height, width }) => (
            <List
              height={height}
              width={width}
              itemCount={viewModel.totalCount}
              itemSize={45} // 每行的高度，需要精确测量或估算
              itemData={{ logs: viewModel.logs, isItemLoaded }}
              onItemsRendered={({ visibleStartIndex, visibleStopIndex }) => {
                 // onItemsRendered 比 onScroll 更高效，它只在可见项变化时触发
                 loadMoreItems(visibleStartIndex, visibleStopIndex);
              }}
            >
              {LogRow}
            </List>
          )}
        </AutoSizer>
        {viewModel.isLoading && (
            <Center position="absolute" bottom="20px" left="50%" transform="translateX(-50%)">
                <Spinner />
            </Center>
        )}
      </Box>
    </VStack>
  );
});

LogViewer组件是observer包裹的，这意味着任何它所依赖的MobX observable状态变化都会触发它的重渲染。
AutoSizer组件用于获取父容器的尺寸，并传递给react-window的List组件，使其能填满可用空间。
List组件是核心，我们配置了itemCount为数据库中的总日志数，itemSize为每行固定高度。
onItemsRendered回调是虚拟列表与我们数据模型交互的入口。当用户滚动时，这个回调被触发，我们调用viewModel.ensureData来确保可视区域的数据已经被加载到MobX状态中。
LogRow组件负责渲染单行日志，它也通过observer包裹以获得最佳性能。

局限性与未来优化路径

这套架构成功地解决大批量数据的客户端渲染问题，但在生产环境中仍有可优化的空间。

首先，getTotalCount的实现是低效的。一个更好的方法是在LevelDB中维护一个专门的元数据key（例如_meta:count），并在每次增删日志时，在同一个batch操作中原子地更新这个计数器，这样获取总数就变成了一次O(1)的读操作。

其次，当前的实现不支持搜索和过滤。在KV存储上实现高效搜索是一个经典难题。简单的方案是全量扫描，但这对于10万条记录来说太慢了。更高级的方案是在LevelDB内部构建倒排索引。例如，当一条日志写入时，可以将其中的关键词提取出来，以idx:keyword -> [log_id1, log_id2]的形式存储。这样，搜索就变成了对索引的快速查找。

最后，数据加载逻辑可以进一步优化。当前的ensureData实现比较简单，可以引入更复杂的缓存策略和预加载机制。比如，在用户滚动时，可以提前加载下一屏的数据，从而提供更无缝的体验。对于写操作，将多个连续的写入请求通过db.batch()合并，可以显著提升写入性能。这些优化将使系统在面对更复杂的需求和更极端的数据量时，依然保持健壮和高效。

Chakra UI LevelDB JavaScript MobX

构建基于Git元数据与OpenTelemetry的Hadoop数据血缘核心库

2023-10-27 数据工程

可观测性 Git OpenTelemetry Hadoop 核心库数据血缘

构建基于LuaJIT的可热插拔插件化WebRTC SFU架构

2023-10-27 分布式架构

Docker WebRTC Lua C++ 架构设计