MMIR Framework API 7.0.0-beta1

new mmir.grammar.GrammarConverter()

The GrammarConverter object initializes the grammar for processing natural language text, e.g. from the voice recognition.

Example

var GrammarConverter = new mmir.require('mmirf/grammarConverter');
var gc = new GrammarConverter();

Requires

module:util/loadFile
module:util/isArray
module:positionUtils

Methods

addProc(proc, isPrepend) semantic/grammarConverter.js, line 633

add pre-/post-processing step for running before/after executeGrammar

Name	Type	Description
`proc`	ProcessingStep	the processing step: { //the name of the processing step name: string, //OPTIONAL pre-processing function: pre(input: string \| Positions, isCalcPos: boolean) pre: Function, //OPTIONAL post-processing function: post(result: any, pos: Positions) post: Function }
`isPrepend`	Boolean \| Number	optional OPTIONAL if omitted (or FALSY): appended `proc` to processing steps if number: insert `proc` at this index into the processing steps-list if TRUE: prepend `proc` to processing steps

See:

Example

//poitionUtils:
var posUtil = mmir.require('mmirf/positionUtils');
//stemming function
var stemFunc = ...;
//add stemming function for pre-processing as first step
grammarConverter.addProc({
 name: 'stem',
 pre: posUtil.createWordPosPreProc(stem, this)
}, true);

executeGrammar(text, options, callback){Object} semantic/grammarConverter.js, line 756

Execute the grammar. NOTE: do not use directly, but mmir.SemanticInterpreter#interpret instead, since that function applies some pre- and post-processing to the text (stopword removal en-/decoding of special characters etc.).

Name	Type	Description
`text`	String	the text String that should be parse.
`options`	Object	optional additional parsing options (some grammar engines may support further options) options.debug: BOOLEAN enable printing debug information options.trace: BOOLEAN \| FUNCTION enable printing verbose/tracing information (may not be supported by the grammar engine)
`callback`	function	optional if #isAsyncExec is TRUE, then executeGrammar will have no return value, but instead the result of the grammar execution is delivered by the `callback`: function callback(result){ ... } (see also description of `return` value below)

Returns:

Type Description

Type	Description
Object	the result of the grammar execution: {phrase: STRING, phrases: OBJECT[], semantic: OBJECT} The property `phrase` contains the `text` which was matched (with removed stopwords). The property `phrases` contains the matched `TOKENS` and `UTTERANCES` from the JSON definition of the grammar as properties as arrays (e.g. for 1 matched TOKEN "token": `{token: ["the matched text"]}`). The returned property `semantic` depends on the JSON definition of the grammar. NOTE: if #isAsyncExec is TRUE, then there will be no return value, but instead the callback is invoked with the return value.

Object

the result of the grammar execution:

{phrase: STRING, phrases: OBJECT[], semantic: OBJECT}

The property phrase contains the text which was matched (with removed stopwords). The property phrases contains the matched TOKENS and UTTERANCES from the JSON definition of the grammar as properties as arrays (e.g. for 1 matched TOKEN "token": {token: ["the matched text"]}). The returned property semantic depends on the JSON definition of the grammar. NOTE: if #isAsyncExec is TRUE, then there will be no return value, but instead the callback is invoked with the return value.

getEncodedStopwords(){Array.<String>} semantic/grammarConverter.js, line 138

HELPER creates a copy of the stopword list and encodes all non-ASCII chars to their unicode representation (e.g. for save storage of stringified stopword list, even if file-encoding does not support non-ASCII letters).

Returns:

Type	Description
Array.<String>	a copy of the stopword list, from the current JSON grammar (or empty list, if no grammar is present)

getGrammarDef(){String} semantic/grammarConverter.js, line 283

Get grammar definition text. This is the "source code" input for the grammar compiler (i.e. syntax for jison, PEG.js or JS/CC). The grammar definition text is generated from the JSON grammar.

Returns:

Type	Description
String	the grammar definition in compiler-specific syntax

getGrammarSource(){String} semantic/grammarConverter.js, line 316

Get the compiled JavaScript grammar source code. This is the output of the grammar compiler (with additional JavaScript "framing" in mmir.SemanticInterpreter#createGrammar). This needs to be eval'ed before it can be executed (eval() will add the corresponding executable grammar to SemanticInterpreter).

Returns:

Type	Description
String	the compiled, JavaScript grammar source code

getProcIndex(proc, startIndex){Number} semantic/grammarConverter.js, line 684

remove a processing step by its index (within procList) or its name NOTE: if multiple processing steps with the same name exist, the first one is removed

Name	Type	Description
`proc`	String	the name of the processing step
`startIndex`	Number	optional OPTIONAL start index for searching (DEFAULT: 0)

See:

addProc
removeProc
procList

Returns:

Type	Description
Number	the index of the processing step, or -1, if there is no such processing step

getStopWordsEncRegExpr() semantic/grammarConverter.js, line 266

FIX for stopwords that start or end with encoded chars (i.e. non-ASCII chars) This RegExp may be NULL/undefined, if no stopwords exist, that begin/end with encoded chars i.e. you need to check for NULL, before trying to use this RegExpr. Usage:

Example

//remove normal stopwords:
 var removedStopwordsStr  = someStr.replace( gc.getStopWordsRegExpr(), '');


 var removedStopwordsStr2 = removedStopwordsStr;
 if(gc.getStopWordsEncRegExpr()){
 	//NOTE replace stopwords with spaces (not with empty String as above, ie. with "normal" stopwords)
 	removedStopwordsStr2 = gc.getStopWordsEncRegExpr().replace( gc.getStopWordsEncRegExpr(), ' ');
 }

maskAsUnicode(str, computePositions){String|Object} semantic/grammarConverter.js, line 965

HELPER uses #maskString for encoding non-ASCII chars to their Unicode representation, i.e. \uXXXX where XXXX is the Unicode HEX number. SHORTCUT for calling maskString(str, '\\u', '').

Name	Type	Description
`str`	String	the string for unicode masking
`computePositions`	Boolean	optional OPTIONAL DEFAULT: false

Returns:

Type Description

Type	Description
String \| Object	the unicode-masked string, or if `computePositions` was `true` a result object with { text: STRING, // the masked string pos: [POSITION] // array of maskink-positions: {i: NUMBER, len: NUMBER, mlen: NUMBER} } where POSITION is an object with { i: NUMBER, // the index within the modified string len: NUMBER, // the length before the modification (i.e. of sub-string that is to be masked) mlen: NUMBER // the length after the modification (i.e. of sub-string that that was masked) }

String | Object

the unicode-masked string, or if computePositions was true a result object with

				{
					text: STRING, // the masked string
					pos: [POSITION] // array of maskink-positions: {i: NUMBER, len: NUMBER, mlen: NUMBER}
				}

where POSITION is an object with

				{
					i: NUMBER, // the index within the modified string
					len: NUMBER, // the length before the modification (i.e. of sub-string that is to be masked)
					mlen: NUMBER // the length after the modification (i.e. of sub-string that that was masked)
				}

Example

//for Japanese "下さい" ("please")
maskAsUnicode("下さい") // -> "\u4E0B\u3055\u3044"

//... and using default masking:
maskString("下さい") // -> "~~4E0B~~~~3055~~~~3044~~"

maskString(str, computePositions, prefix, postfix){String|Object} semantic/grammarConverter.js, line 813

Masks unicoded characters strings. Unicode characters are mask by replacing them with ~~XXXX~~ where XXXX is the four digit unicode HEX number.

NOTE that this function is stable with regard to multiple executions: If the function is invoked on the returned String again, the returned String will be the same / unchanged, i.e. maskings (i.e. "~~XXXX~~") will not be masked again.

NOTE: currently, the masking pattern cannot be escaped, i.e. if the original String contains a substring that matches the masking pattern, it cannot be escaped, so that the unmask-function will leave it untouched.

Name	Type	Description
`str`	String	the String to process
`computePositions`	Boolean	optional OPTIONAL DEFAULT: false
`prefix`	String	optional OPTIONAL an alternative prefix used for masking, i.e instead of `~~` (ignored, if argument has other type than `string`)
`postfix`	String	optional OPTIONAL an alternative postfix used for masking, i.e instead of `~~` (ignored, if argument has other type than `string`)

Returns:

Type Description

Type	Description
String \| Object	the masked string, or if `computePositions` was `true` a result object with { text: STRING, // the masked string pos: [POSITION] // array of maskink-positions: {i: NUMBER, len: NUMBER, mlen: NUMBER} } where POSITION is an object with { i: NUMBER, // the index within the modified string len: NUMBER, // the length before the modification (i.e. of sub-string that is to be masked) mlen: NUMBER // the length after the modification (i.e. of sub-string that that was masked) }

String | Object

the masked string, or if computePositions was true a result object with

				{
					text: STRING, // the masked string
					pos: [POSITION] // array of maskink-positions: {i: NUMBER, len: NUMBER, mlen: NUMBER}
				}

where POSITION is an object with

				{
					i: NUMBER, // the index within the modified string
					len: NUMBER, // the length before the modification (i.e. of sub-string that is to be masked)
					mlen: NUMBER // the length after the modification (i.e. of sub-string that that was masked)
				}

postproc(procResult, pos, processingSteps) semantic/grammarConverter.js, line 587

Post-processes the result from the applied grammar: * un-masks non-ASCI characters addProc can be used to add additional pre-/post-processing steps

Name	Type	Description
`procResult`	SemanticResult
`pos`	Positions	the position information (i.e. modifications) of the pre-processing steps
`processingSteps`	Array.<ProcessingStep>	optional OPTIONAL if given, use `processingSteps` instead of (field) `procList` NOTE positional argument (i.e. must specify `pos` too)

See:

preproc(thePhrase, pos, processingSteps){String} semantic/grammarConverter.js, line 538

Apply pre-processing to the string, before applying the grammar: * escape (i.e. "mask") non-ASCI characters * remove stopwords addProc can be used to add additional pre-/post-processing steps

Name	Type	Description
`thePhrase`	String
`pos`	PlainObject	optional OPTIONAL in/out argument: if given, the pre-processor will add fields with information on how the input string `thePhrase` was modified By default the position information for escaped characters and removed stopwords will be added to `pos.escape` (see `maskString` for more details) `pos.stopwords` (see `removeStopwords` for more details) And the field `pos._order` will contain the ordered list of pre-processing steps that where applied i.e. the enries correspond to the field names, e.g. by default the list would contain `['escape', 'stopwords']`
`processingSteps`	Array.<ProcessingStep>	optional OPTIONAL if given, use `processingSteps` instead of (field) `procList` NOTE positional argument (i.e. must specify `pos` too)

See:

Returns:

Type	Description
String	the pre-processed string

recodeJSON(json, recodeFunc, isMaskValues, isMaskNames){Object} semantic/grammarConverter.js, line 1116

Recodes Strings of a JSON-like object.

Name	Type	Description
`json`	Object	the JSON-like object (i.e. PlainObject)
`recodeFunc`	function	the "recoding" function for modifying String values: must accecpt a String argument and return a String `String recodeFunc(String)`. The `recodeFunc` function is invoked in context of the GrammarConverter object. Example: this.maskString(). See `maskString`.k
`isMaskValues`	Boolean	optional OPTIONAL if true, the object's property String values will be processed NOTE: in case this parameter is specified, then `recodeFunc` must also be specified! DEFAULT: uses property `maskValues`
`isMaskNames`	Boolean	optional OPTIONAL if true, the property names will be processed NOTE: in case this parameter is specified, then `recodeFunc` and `isMaskValues` must also be specified! DEFAULT: uses property `maskNames`

Returns:

Type	Description
Object	the recoded JSON object

removeProc(proc){ProcessingStep} semantic/grammarConverter.js, line 659

remove a processing step by its index (within procList) or its name NOTE: if multiple processing steps with the same name exist, the last one is removed

Name	Type	Description
`proc`	Number \| String	the name or index of the processing step that should be removed

See:

Returns:

Type	Description
ProcessingStep	the removed processing step, or undefined, if there was no matchin processing step

removeStopwords(thePhrase, computePositions){String|Object} semantic/grammarConverter.js, line 380

Name	Type	Description
`thePhrase`	String	the string from which to remove stopwords (and trim()'ed)
`computePositions`	Boolean	optional OPTIONAL DEFAULT: false

Returns:

Name Type Description

Name	Type	Description
`the`	String \| Object	string where stopwords were removed, or if `computePositions` was `true` a result object where the positions at which stopwords were removed will be available as an array: { text: STRING, // the string with removed stopwords pos: [POSITION] // array of positions for removed stopwords: {i: NUMBER, len: NUMBER, mlen: NUMBER} } where POSITION is an object with { i: NUMBER, // the index within the modified string len: NUMBER, // the length before the modification (i.e. of sub-string that is to be masked) mlen: NUMBER // the length after the modification (i.e. of sub-string that that was masked) }
`the`	String	string where stopwords were removed

the

String | Object

string where stopwords were removed, or if computePositions was true a result object where the positions at which stopwords were removed will be available as an array:

				{
					text: STRING, // the string with removed stopwords
					pos: [POSITION] // array of positions for removed stopwords: {i: NUMBER, len: NUMBER, mlen: NUMBER}
				}

where POSITION is an object with

				{
					i: NUMBER, // the index within the modified string
					len: NUMBER, // the length before the modification (i.e. of sub-string that is to be masked)
					mlen: NUMBER // the length after the modification (i.e. of sub-string that that was masked)
				}

the String string where stopwords were removed

protectedsetGrammarDef(rawGrammarSyntax) semantic/grammarConverter.js, line 301

Sets the grammar definition text. This function should only be used during compilation of the JSON grammar to the executable grammar. NOTE: Setting this "manually" will have no effect on the executable grammar.

Name	Type	Description
`rawGrammarSyntax`	String	the grammar definition in compiler-specific syntax

See:

getGrammarDef

setGrammarFunction(func, isAsnc) semantic/grammarConverter.js, line 343

Set the executable grammar function. The grammar function takes a String argument: the text that should be parsed. a Function argument: the callback for the result. where the callback itself takes 1 argument for the result: callback(result) The returned result depends on the JSON definition of the grammar: func(inputText, resultCallback)

Name	Type	Description
`func`	function	the executable grammar function: `func(string, object, function(object)) : object`
`isAsnc`	Boolean	optional OPTIONAL set to TRUE, if execution is asynchronously done. DEFAULT: FALSE

See:

exectueGrammar

unmaskString(str, computePositions, detector){String|Object} semantic/grammarConverter.js, line 1012

Unmasks masked unicoded characters in a string. Masked unicode characters are assumed to have the pattern: ~~XXXX~~ where XXXX is the four digit unicode HEX number.

NOTE that this function is stable with regard to multiple executions, IF the original String str did not contain a sub-string that conforms to the encoding pattern (see remark for maskString): If the function is invoked on the returned String again, the returned String will be the same, i.e. unchanged.

Name	Type	Description
`str`	String
`computePositions`	Boolean	optional OPTIONAL DEFAULT: false
`detector`	RegExp	optional OPTIONAL an alternative detector-RegExp: the RegExp must conatin at least one grouping which detects a unicode number (HEX), e.g. default detector is `~~([0-9\|A-F\|a-f]{4})~~` (note the grouping for detecting a 4-digit HEX number within the brackets).

Returns:

Type Description

Type	Description
String \| Object	the masked string, or if `computePositions` was `true` a result object with { text: STRING, // the masked string pos: [POSITION] // array of maskink-positions: {i: NUMBER, len: NUMBER, mlen: NUMBER} } where POSITION is an object with { i: NUMBER, // the index within the modified string len: NUMBER, // the length before the modification (i.e. of sub-string that is to be masked) mlen: NUMBER // the length after the modification (i.e. of sub-string that that was masked) }

String | Object

the masked string, or if computePositions was true a result object with

				{
					text: STRING, // the masked string
					pos: [POSITION] // array of maskink-positions: {i: NUMBER, len: NUMBER, mlen: NUMBER}
				}

where POSITION is an object with

				{
					i: NUMBER, // the index within the modified string
					len: NUMBER, // the length before the modification (i.e. of sub-string that is to be masked)
					mlen: NUMBER // the length after the modification (i.e. of sub-string that that was masked)
				}

MMIR Framework

Namespaces

Interfaces

Classes

Modules

Class: GrammarConverter

new mmir.grammar.GrammarConverter()

Example

Requires

Methods

addProc(proc, isPrepend)

Example

executeGrammar(text, options, callback){Object}

Returns:

getEncodedStopwords(){Array.<String>}

Returns:

getGrammarDef(){String}

Returns:

getGrammarSource(){String}

Returns:

getProcIndex(proc, startIndex){Number}

Returns:

getStopWordsEncRegExpr()

Example

maskAsUnicode(str, computePositions){String|Object}

Returns:

Example

maskString(str, computePositions, prefix, postfix){String|Object}

Returns:

postproc(procResult, pos, processingSteps)

preproc(thePhrase, pos, processingSteps){String}

Returns:

recodeJSON(json, recodeFunc, isMaskValues, isMaskNames){Object}

Returns:

removeProc(proc){ProcessingStep}

Returns:

removeStopwords(thePhrase, computePositions){String|Object}

Returns:

protectedsetGrammarDef(rawGrammarSyntax)

setGrammarFunction(func, isAsnc)

unmaskString(str, computePositions, detector){String|Object}

Returns: