Options
All
  • Public
  • Public/Protected
  • All
Menu

Class Segment

创建分词器接口

Hierarchy

  • Segment

Index

Type aliases

Static IDICT_BLACKLIST

IDICT_BLACKLIST: IDICT<boolean>

Static IDICT_STOPWORD

IDICT_STOPWORD: IDICT<boolean>

Static IDICT_SYNONYM

IDICT_SYNONYM: IDICT<string>

Static IOptionsSegment

IOptionsSegment: IOptionsTableDict & object

Static ISPLIT

ISPLIT: RegExp | string | object

Static ISPLIT_FILTER

ISPLIT_FILTER: RegExp | object

Constructors

constructor

Properties

DICT

DICT: object

词典表

type

{{}}

Type declaration

POSTAG

POSTAG: POSTAG = POSTAG

词性

type

{POSTAG}

SPLIT

SPLIT: ISPLIT = /([\r\n]+|^[ \s+]+|[ \s]+$|[ \s]{2,})/gm as ISPLIT

分段

由於 segment 是利用對內容的前後文分析來進行分詞 所以如何切割段落對於結果就會產生不同影響

RegExp or 具有 .[Symbol.split](input: string, limit?: number) => string[] 的物件

type

{Segment.ISPLIT}

SPLIT_FILTER

SPLIT_FILTER: ISPLIT_FILTER = /^([\r\n]+)$/g as ISPLIT_FILTER

分段之後 如果符合以下條件 則直接忽略分析 RegExp or 具有 .test(input: string) => boolean 的物件

type

{Segment.ISPLIT_FILTER}

db

db: object

Type declaration

Optional inited

inited: boolean

modules

modules: object = {/*** 分词模块*/tokenizer: [],/*** 优化模块*/optimizer: [],} as {tokenizer: ISubTokenizer[],optimizer: ISubOptimizer[],}

Type declaration

optimizer

optimizer: Optimizer

options

tokenizer

tokenizer: Tokenizer

Static defaultOptionsDoSegment

defaultOptionsDoSegment: IOptionsDoSegment

Methods

Protected _get_text

  • _get_text(text: string | Buffer): string

Protected _loadBlacklistDict

_resolveDictFilename

  • _resolveDictFilename(name: string, pathPlus?: string[], extPlus?: string[]): string | string[]
  • Parameters

    • name: string
    • Default value pathPlus: string[] = []
    • Default value extPlus: string[] = []

    Returns string | string[]

addBlacklist

  • addBlacklist(word: string, remove?: boolean): this
  • Parameters

    • word: string
    • Optional remove: boolean

    Returns this

autoInit

  • autoInit(options?: object): this
  • 此函數只需執行一次,並且一般狀況下不需要手動呼叫

    Parameters

    • Optional options: object
      • Optional all_mod?: boolean

    Returns this

convertSynonym

doBlacklist

  • doBlacklist(): this

doSegment

  • 开始分词

    Parameters

    • text: string | Buffer

      文本

    • options: IOptionsDoSegment & object

      选项

      • {Boolean} simple 是否仅返回单词内容
      • {Boolean} stripPunctuation 去除标点符号
      • {Boolean} convertSynonym 转换同义词
      • {Boolean} stripStopword 去除停止符

    Returns string[]

  • Parameters

    Returns IWord[]

getDict

getDictDatabase

  • getDictDatabase<R>(type: SYNONYM, autocreate?: boolean, libTableDict?: object): R
  • getDictDatabase<R>(type: TABLE, autocreate?: boolean, libTableDict?: object): R
  • getDictDatabase<R>(type: STOPWORD, autocreate?: boolean, libTableDict?: object): R
  • getDictDatabase<R>(type: BLACKLIST, autocreate?: boolean, libTableDict?: object): R
  • getDictDatabase<R>(type: BLACKLIST_FOR_OPTIMIZER, autocreate?: boolean, libTableDict?: object): R
  • getDictDatabase<R>(type: string | EnumDictDatabase, autocreate?: boolean, libTableDict?: object): R

getOptionsDoSegment

  • getOptionsDoSegment<T>(options?: T): T

indexOf

  • indexOf(words: IWord[], s: string | number, cur?: number): number
  • 在单词数组中查找某一个单词或词性所在的位置

    Parameters

    • words: IWord[]

      单词数组

    • s: string | number

      要查找的单词或词性

    • Optional cur: number

      开始位置

    Returns number

    找不到,返回-1

listModules

  • Parameters

    Returns object

    • disable: object
      • optimizer: any[]
      • tokenizer: any[]
    • enable: object
      • optimizer: any[]
      • tokenizer: any[]

loadBlacklistDict

  • loadBlacklistDict(name: string): this

loadBlacklistOptimizerDict

  • loadBlacklistOptimizerDict(name: string): this

loadDict

  • loadDict(name: string, type?: string, convert_to_lower?: boolean, skipExists?: boolean): this
  • 载入字典文件

    Parameters

    • name: string

      字典文件名

    • Optional type: string

      类型

    • Optional convert_to_lower: boolean

      是否全部转换为小写

    • Optional skipExists: boolean

    Returns this

loadStopwordDict

  • loadStopwordDict(name: string): this
  • 载入停止符词典

    Parameters

    • name: string

      字典文件名

    Returns this

loadSynonymDict

  • loadSynonymDict(name: string, skipExists?: boolean): this
  • 载入同义词词典

    Parameters

    • name: string

      字典文件名

    • Optional skipExists: boolean

    Returns this

split

  • split(words: IWord[], s: string | number): IWord[]
  • 根据某个单词或词性来分割单词数组

    Parameters

    • words: IWord[]

      单词数组

    • s: string | number

      用于分割的单词或词性

    Returns IWord[]

stringify

  • stringify(words: Array<IWord | string>, ...argv: any[]): string
  • 将单词数组连接成字符串

    Parameters

    • words: Array<IWord | string>

      单词数组

    • Rest ...argv: any[]

    Returns string

use

useDefault

  • useDefault(...argv: any[]): this
  • 使用默认的识别模块和字典文件 在使用預設值的情況下,不需要主動呼叫此函數

    Parameters

    • Rest ...argv: any[]

    Returns this

Static stringify

  • stringify(words: Array<IWord | string>, ...argv: any[]): string

Generated using TypeDoc