Class GrammarConverter

Defined in: grammarConverter.js.

Class Summary
Constructor Attributes	Constructor Name and Description
	GrammarConverter() The GrammarConverter object initializes the grammar for processing natural language text, e.g.

Method Summary
Method Attributes	Method Name and Description
	decodeUmlauts(target, doAlsoEncodeUpperCase)
	encodeUmlauts(target, doAlsoEncodeUpperCase)
	executeGrammar(text, callback) Execute the grammar.
	getEncodedStopwords() HELPER creates a copy of the stopword list and encodes all non-ASCII chars to their unicode representation (e.g.
	getGrammarDef() Get grammar definition text.
	getGrammarSource() Get the compiled JavaScript grammar source code.
	getStopWordsEncRegExpr() FIX for stopwords that start or end with encoded chars (i.e.
	maskAsUnicode(str) HELPER uses #maskString for encoding non-ASCII chars to their Unicode representation, i.e.
	maskString(str, prefix, postfix) Masks unicoded characters strings.
	recodeJSON(json, recodeFunc, isMaskValues, isMaskNames) Recodes Strings of a JSON-like object.
	setGrammarDef(rawGrammarSyntax) Sets the grammar definition text.
	setGrammarFunction(func, isAsnc) Set the executable grammar function.
	unmaskString(str, detector) Unmasks masked unicoded characters in a string.

Class Detail

GrammarConverter()

The GrammarConverter object initializes the grammar for processing natural language text, e.g. from the voice recognition.

Requires:: mmir.CommonUtils.isArray; jQuery.ajax

Method Detail

{String|Object} decodeUmlauts(target, doAlsoEncodeUpperCase)

Parameters:
{String|Object} target: the String for wich all contained umlauts-encoding should be replaced with the original umlauts. If this parameter is not a String, it will be converted using JSON.stringify() and the resulting String will be processed (may lead to errors if umlauts occur in "strange" places within the stringified object).
{Boolean} doAlsoEncodeUpperCase Optional: OPTIONAL if true, then upper-case umlauts-encodings will be decoded, too DEFAULT: false (i.e. no decoding for upper-case umlauts-encodings)

Deprecated:
this is used for the old-style encoding / decoding for umlauts (now masking for ALL unicode chars is used!)

Returns:: {String|Object} the String with decoded umlauts-encodings (i.e. with the "original" umlauts). If the input argument target was an Object, the return value will also be an Object, for which the processing stringified Object is converted back using JSON.parse() (may lead to errors if umlauts occur in "strange" places within the stringified object).

{String|Object} encodeUmlauts(target, doAlsoEncodeUpperCase)

Parameters:
{String|Object} target: the String for wich all contained umlauts should be replaced with an encoded version. If this parameter is not a String, it will be converted using JSON.stringify() and the resulting String will be processed (may lead to errors if umlauts occur in "strange" places within the stringified object).
{Boolean} doAlsoEncodeUpperCase Optional: OPTIONAL if true, then upper-case umlauts will be encoded, too DEFAULT: false (i.e. no encoding for upper-case umlauts)

Deprecated:
this is used for the old-style encoding / decoding for umlauts (now masking for ALL unicode chars is used!)

Returns:: {String|Object} the String with encoded umlauts. If the input argument target was an Object, the return value will also be an Object, for which the processing stringified Object is converted back using JSON.parse() (may lead to errors if umlauts occur in "strange" places within the stringified object).

{Object} executeGrammar(text, callback)

Execute the grammar. NOTE: do not use directly, but mmir.SemanticInterpreter.getASRSemantic instead, since that function applies some pre- and post-processing to the text (stopword removal en-/decoding of special characters etc.).

Parameters:

{String} text

the text String that should be parse.

{Function} callback Optional

if #isAsyncExec is TRUE, then executeGrammar will have no return value, but instead the result of the grammar execution is delivered by the callback:

function callback(result){ ... }

(see also description of return value below)

Returns:: {Object} the result of the grammar execution: {phrase: STRING, phrases: OBJECT, semantic: OBJECT} The property phrase contains the text which was matched (with removed stopwords). The property phrases contains the matched TOKENS and UTTERANCES from the JSON definition of the grammar as properties as arrays (e.g. for 1 matched TOKEN "token": {token: ["the matched text"]}). The returned property semantic depends on the JSON definition of the grammar. NOTE: if #isAsyncExec is TRUE, then there will be no return value, but instead the callback is invoked with the return value.

{Array} getEncodedStopwords()

HELPER creates a copy of the stopword list and encodes all non-ASCII chars to their unicode representation (e.g. for save storage of stringified stopword list, even if file-encoding does not support non-ASCII letters).

Returns:: {Array} a copy of the stopword list, from the current JSON grammar (or empty list, if no grammar is present)

{String} getGrammarDef()

Get grammar definition text. This is the "source code" input for the grammar compiler (i.e. syntax for jison, PEG.js or JS/CC). The grammar definition text is generated from the JSON grammar.

Returns:: {String} the grammar definition in compiler-specific syntax

{String} getGrammarSource()

Get the compiled JavaScript grammar source code. This is the output of the grammar compiler (with additional JavaScript "framing" in SemanticInterpreter.createGrammar). This needs to be eval'ed before it can be executed (eval() will add the corresponding executable grammar to SemanticInterpreter).

Returns:: {String} the compiled, JavaScript grammar source code

getStopWordsEncRegExpr()

FIX for stopwords that start or end with encoded chars (i.e. non-ASCII chars) This RegExp may be NULL/undefined, if no stopwords exist, that begin/end with encoded chars i.e. you need to check for NULL, before trying to use this RegExpr. Usage:

 
 //remove normal stopwords:
 var removedStopwordsStr  = someStr.replace( gc.getStopWordsRegExpr(), '');
 
 
 var removedStopwordsStr2 = removedStopwordsStr;
 if(gc.getStopWordsEncRegExpr()){
 	//NOTE replace stopwords with spaces (not with empty String as above, ie. with "normal" stopwords) 
 	removedStopwordsStr2 = gc.getStopWordsEncRegExpr().replace( gc.getStopWordsEncRegExpr(), ' ');
 }

maskAsUnicode(str)

HELPER uses #maskString for encoding non-ASCII chars to their Unicode representation, i.e. \uXXXX where XXXX is the Unicode HEX number. SHORTCUT for calling maskString(str, '\\u', '').

//for Japanese "下さい" ("please")
maskAsUnicode("下さい") -> "\u4E0B\u3055\u3044"

//... and using default masking:
maskString("下さい") -> "~~4E0B~~~~3055~~~~3044~~"

Parameters:
str

{String} maskString(str, prefix, postfix)

Masks unicoded characters strings. Unicode characters are mask by replacing them with ~~XXXX~~ where XXXX is the four digit unicode HEX number.

NOTE that this function is stable with regard to multiple executions: If the function is invoked on the returned String again, the returned String will be the same / unchanged, i.e. maskings (i.e. "~~XXXX~~") will not be masked again.

NOTE: currently, the masking pattern cannot be escaped, i.e. if the original String contains a substring that matches the masking pattern, it cannot be escaped, so that the unmask-function will leave it untouched.

Parameters:
{String} str: the String to process
{String} prefix Optional: OPTIONAL an alternative prefix used for masking, i.e instead of ~~ (ignored, if argument has other type than string)
{String} postfix Optional: OPTIONAL an alternative postfix used for masking, i.e instead of ~~ (ignored, if argument has other type than string)

Returns:: {String} the masked string

{Object} recodeJSON(json, recodeFunc, isMaskValues, isMaskNames)

Recodes Strings of a JSON-like object.

Parameters:
{Object} json: the JSON-like object (i.e. PlainObject)
{Function} recodeFunc: the "recoding" function for modifying String values: must accecpt a String argument and return a String String recodeFunc(String). The function is invoked in context of the GrammarConverter object. Example: this.maskString(). See #maskString.k
{Boolean} isMaskValues Optional: OPTIONAL if true, the object's property String values will be processed NOTE: in case this parameter is specified, then recodeFunc must also be specified! DEFAULT: uses property #maskValues
{Boolean} isMaskNames Optional: OPTIONAL if true, the property names will be processed NOTE: in case this parameter is specified, then recodeFunc and isMaskValues must also be specified! DEFAULT: uses property #maskNames

Returns:: {Object} the recoded JSON object

Requires:: or Array#isArray

setGrammarDef(rawGrammarSyntax)

Sets the grammar definition text. This function should only be used during compilation of the JSON grammar to the executable grammar. NOTE: Setting this "manually" will have no effect on the executable grammar.

Parameters:
{String} rawGrammarSyntax: the grammar definition in compiler-specific syntax

See:: #getGrammarDef

setGrammarFunction(func, isAsnc)

Set the executable grammar function. The grammar function takes a String argument: the text that should be parsed. a Function argument: the callback for the result. where the callback itself takes 1 argument for the result: callback(result) The returned result depends on the JSON definition of the grammar: func(inputText, resultCallback)

Parameters:
{Function} func: the executable grammar function: func(string, function(object)) : object
{Boolean} isAsnc Optional: OPTIONAL set to TRUE, if execution is asynchronously done. DEFAULT: FALSE

See:: #exectueGrammar

{String} unmaskString(str, detector)

Unmasks masked unicoded characters in a string. Masked unicode characters are assumed to have the pattern: ~~XXXX~~ where XXXX is the four digit unicode HEX number.

NOTE that this function is stable with regard to multiple executions, IF the original String str did not contain a sub-string that conforms to the encoding pattern (see remark for #maskString): If the function is invoked on the returned String again, the returned String will be the same, i.e. unchanged.

Parameters:
{String} str
{RegExp} detector Optional: OPTIONAL an alternative detector-RegExp: the RegExp must conatin at least one grouping which detects a unicode number (HEX), e.g. default detector is ~~([0-9|A-F|a-f]{4})~~ (note the grouping for detecting a 4-digit HEX number within the brackets).

Returns:: {String} the unmasked string

Classes

Class GrammarConverter