Class: GrammarConverter

GrammarConverter

new GrammarConverter()

The GrammarConverter object initializes the grammar for processing natural language text, e.g. from the voice recognition.

Requires

  • module:util/loadFile
  • module:util/isArray

Methods

executeGrammar(text, callback){Object}

Execute the grammar. NOTE: do not use directly, but mmir.SemanticInterpreter.interpret instead, since that function applies some pre- and post-processing to the text (stopword removal en-/decoding of special characters etc.).
Name Type Description
text String the text String that should be parse.
callback function optional if #isAsyncExec is TRUE, then executeGrammar will have no return value, but instead the result of the grammar execution is delivered by the callback:
function callback(result){ ... }
(see also description of return value below)
Returns:
result of the grammar execution: {phrase: STRING, phrases: OBJECT, semantic: OBJECT} The property phrase contains the text which was matched (with removed stopwords). The property phrases contains the matched TOKENS and UTTERANCES from the JSON definition of the grammar as properties as arrays (e.g. for 1 matched TOKEN "token": {token: ["the matched text"]}). The returned property semantic depends on the JSON definition of the grammar. NOTE: if #isAsyncExec is TRUE, then there will be no return value, but instead the callback is invoked with the return value.

getCodeWrapPrefix(fileFormatVersion, execMode){String}

Get code-prefix for wrapping generated, executable grammars.
Name Type Description
fileFormatVersion Number the file format (see mmir.SemanticInterpreter#getFileVersion)
execMode String the execution mode for the generated grammar: 'sync' | 'async'
See:
  • mmir.parser#STORAGE_CODE_WRAP_PREFIX
Returns:
prefix code for generated grammars (i.e. prepend to generated grammar code)

getCodeWrapSuffix(encodedStopwords, grammarFuncName, grammarId){String}

Get code-suffix for wrapping generated, executable grammars.
Name Type Description
encodedStopwords Array.<string> the list of encoded stopwords (see getEncodedStopwords)
grammarFuncName String the (variable's) name of the grammar function that was generated (and will be used in executeGrammar)
grammarId String the ID for the grammar (e.g. language code) with which the grammar will be registered with SemanticInterpreter (see mmir.SemanticInterpreter#addGrammar)
See:
  • mmir.parser#STORAGE_CODE_WRAP_SUFFIX
Returns:
suffix code for generated grammars (i.e. append to generated grammar code)

getEncodedStopwords(){Array.<String>}

HELPER creates a copy of the stopword list and encodes all non-ASCII chars to their unicode representation (e.g. for save storage of stringified stopword list, even if file-encoding does not support non-ASCII letters).
Returns:
copy of the stopword list, from the current JSON grammar (or empty list, if no grammar is present)
Get grammar definition text. This is the "source code" input for the grammar compiler (i.e. syntax for jison, PEG.js or JS/CC). The grammar definition text is generated from the JSON grammar.
Returns:
grammar definition in compiler-specific syntax
Get the compiled JavaScript grammar source code. This is the output of the grammar compiler (with additional JavaScript "framing" in SemanticInterpreter.createGrammar). This needs to be eval'ed before it can be executed (eval() will add the corresponding executable grammar to SemanticInterpreter).
Returns:
compiled, JavaScript grammar source code

getStopWordsEncRegExpr()

FIX for stopwords that start or end with encoded chars (i.e. non-ASCII chars) This RegExp may be NULL/undefined, if no stopwords exist, that begin/end with encoded chars i.e. you need to check for NULL, before trying to use this RegExpr. Usage:
Example
//remove normal stopwords:
 var removedStopwordsStr  = someStr.replace( gc.getStopWordsRegExpr(), '');
 
 
 var removedStopwordsStr2 = removedStopwordsStr;
 if(gc.getStopWordsEncRegExpr()){
 	//NOTE replace stopwords with spaces (not with empty String as above, ie. with "normal" stopwords) 
 	removedStopwordsStr2 = gc.getStopWordsEncRegExpr().replace( gc.getStopWordsEncRegExpr(), ' ');
 }
HELPER uses #maskString for encoding non-ASCII chars to their Unicode representation, i.e. \uXXXX where XXXX is the Unicode HEX number. SHORTCUT for calling maskString(str, '\\u', '').
Example
//for Japanese "下さい" ("please")
maskAsUnicode("下さい") -> "\u4E0B\u3055\u3044"

//... and using default masking:
maskString("下さい") -> "~~4E0B~~~~3055~~~~3044~~"

maskString(str, computePositions, prefix, postfix){String|Object}

Masks unicoded characters strings. Unicode characters are mask by replacing them with ~~XXXX~~ where XXXX is the four digit unicode HEX number.

NOTE that this function is stable with regard to multiple executions: If the function is invoked on the returned String again, the returned String will be the same / unchanged, i.e. maskings (i.e. "~~XXXX~~") will not be masked again.

NOTE: currently, the masking pattern cannot be escaped, i.e. if the original String contains a substring that matches the masking pattern, it cannot be escaped, so that the unmask-function will leave it untouched.

Name Type Description
str String the String to process
computePositions Boolean optional OPTIONAL DEFAULT: false
prefix String optional OPTIONAL an alternative prefix used for masking, i.e instead of ~~ (ignored, if argument has other type than string)
postfix String optional OPTIONAL an alternative postfix used for masking, i.e instead of ~~ (ignored, if argument has other type than string)
Returns:
masked string, or if computePositions was true a result object with
				{
					str: STRING, // the masked string
					pos: [POSITION] // array of maskink-positions: {i: NUMBER, len: NUMBER, mlen: NUMBER}
				}
				
where POSITION is an object with
				{
					i: NUMBER, // the index within the modified string
					len: NUMBER, // the length before the modification (i.e. of sub-string that is to be masked)
					mlen: NUMBER // the length after the modification (i.e. of sub-string that that was masked)
				}
				

postproc(procResult, recodeFunc)

Post-processes the result from the applied grammar: * un-masks non-ASCI characters
Name Type Description
procResult SemanticResult
recodeFunc function optional function that recodes non-ASCI characters (or reverts the recoding)

preproc(thePhrase, pos, maskFunc, stopwordFunc){String}

Apply pre-processing to the string, before applying the grammar: * mask non-ASCI characters * remove stopwords
Name Type Description
thePhrase String
pos PlainObject optional OPTIONAL in/out argument: if given, the pre-processor will add fields with information on how the input string thePhrase was modified Namely, the position information for removed stopwords will be added to pos.stopwords (see removeStopwords for more details) NOTE that this may not work, if custom maskFunc and/or stopwordFunc are provided as well.
maskFunc function optional OPTIONAL custom function for masking non-ASCI characters:
maskFunc(inputStr : STRING [, isCalcPosition: BOOLEAN]) : STRING | {str: STRING, pos: ARRAY}
DEFAUL: use of this.maskString(thePhrase, !!pos)
stopwordFunc function optional OPTIONAL custom function for removing stopwords
stopwordFunc(inputStr : STRING [, positions: ARRAY]) : STRING | {str: STRING, pos: ARRAY}
DEFAUL: use of this.removeStopwords(str, []) NOTE that maskFunc must also be specified, if this argument is used
Returns:
pre-processed string

recodeJSON(json, recodeFunc, isMaskValues, isMaskNames){Object}

Recodes Strings of a JSON-like object.
Name Type Description
json Object the JSON-like object (i.e. PlainObject)
recodeFunc function the "recoding" function for modifying String values: must accecpt a String argument and return a String String recodeFunc(String). The function is invoked in context of the GrammarConverter object. Example: this.maskString(). See maskString.k
isMaskValues Boolean optional OPTIONAL if true, the object's property String values will be processed NOTE: in case this parameter is specified, then recodeFunc must also be specified! DEFAULT: uses property maskValues
isMaskNames Boolean optional OPTIONAL if true, the property names will be processed NOTE: in case this parameter is specified, then recodeFunc and isMaskValues must also be specified! DEFAULT: uses property maskNames
Returns:
recoded JSON object

removeStopwords(thePhrase, positions){String}

Name Type Description
thePhrase String the string from which to remove stopwords (and trim()'ed)
positions Array.<Position> optional OPTIONAL if provided, the positions at which stopwords were removed will be added to this array, where each position-object is comprised of
					{
						i: NUMBER the index at which the stopword was removed
						mlen: NUMBER the length of the stopword that was removed
					}
				
the positions will order by occurance (i.e. by pos.i)
Returns:
string where stopwords were removed

protectedsetGrammarDef(rawGrammarSyntax)

Sets the grammar definition text. This function should only be used during compilation of the JSON grammar to the executable grammar. NOTE: Setting this "manually" will have no effect on the executable grammar.
Name Type Description
rawGrammarSyntax String the grammar definition in compiler-specific syntax
See:

setGrammarFunction(func, isAsnc)

Set the executable grammar function. The grammar function takes a String argument: the text that should be parsed. a Function argument: the callback for the result. where the callback itself takes 1 argument for the result: callback(result) The returned result depends on the JSON definition of the grammar: func(inputText, resultCallback)
Name Type Description
func function the executable grammar function: func(string, function(object)) : object
isAsnc Boolean optional OPTIONAL set to TRUE, if execution is asynchronously done. DEFAULT: FALSE
See:
  • exectueGrammar

unmaskString(str, computePositions, detector){String|Object}

Unmasks masked unicoded characters in a string. Masked unicode characters are assumed to have the pattern: ~~XXXX~~ where XXXX is the four digit unicode HEX number.

NOTE that this function is stable with regard to multiple executions, IF the original String str did not contain a sub-string that conforms to the encoding pattern (see remark for maskString): If the function is invoked on the returned String again, the returned String will be the same, i.e. unchanged.

Name Type Description
str String
computePositions Boolean optional OPTIONAL DEFAULT: false
detector RegExp optional OPTIONAL an alternative detector-RegExp: the RegExp must conatin at least one grouping which detects a unicode number (HEX), e.g. default detector is ~~([0-9|A-F|a-f]{4})~~ (note the grouping for detecting a 4-digit HEX number within the brackets).
Returns:
masked string, or if computePositions was true a result object with
				{
					str: STRING, // the masked string
					pos: [POSITION] // array of maskink-positions: {i: NUMBER, len: NUMBER, mlen: NUMBER}
				}
				
where POSITION is an object with
				{
					i: NUMBER, // the index within the modified string
					len: NUMBER, // the length before the modification (i.e. of sub-string that is to be masked)
					mlen: NUMBER // the length after the modification (i.e. of sub-string that that was masked)
				}