Class GrammarConverter
Defined in: grammarConverter.js.
Constructor Attributes | Constructor Name and Description |
---|---|
The GrammarConverter object initializes the grammar for processing
natural language text, e.g.
|
Method Attributes | Method Name and Description |
---|---|
decodeUmlauts(target, doAlsoEncodeUpperCase)
|
|
encodeUmlauts(target, doAlsoEncodeUpperCase)
|
|
executeGrammar(text, callback)
Execute the grammar.
|
|
HELPER creates a copy of the stopword list and encodes all non-ASCII chars to their unicode
representation (e.g.
|
|
Get grammar definition text.
|
|
Get the compiled JavaScript grammar source code.
|
|
FIX for stopwords that start or end with encoded chars (i.e.
|
|
maskAsUnicode(str)
HELPER uses #maskString for encoding non-ASCII chars to their Unicode representation,
i.e.
|
|
maskString(str, prefix, postfix)
Masks unicoded characters strings.
|
|
recodeJSON(json, recodeFunc, isMaskValues, isMaskNames)
Recodes Strings of a JSON-like object.
|
|
setGrammarDef(rawGrammarSyntax)
Sets the grammar definition text.
|
|
setGrammarFunction(func, isAsnc)
Set the executable grammar function.
|
|
unmaskString(str, detector)
Unmasks masked unicoded characters in a string.
|
- Requires:
- mmir.CommonUtils.isArray
- jQuery.ajax
- Parameters:
- {String|Object} target
- the String for wich all contained umlauts-encoding should be replaced with the original umlauts.
If this parameter is not a String, it will be converted using
JSON.stringify()
and the resulting String will be processed (may lead to errors if umlauts occur in "strange" places within the stringified object). - {Boolean} doAlsoEncodeUpperCase Optional
- OPTIONAL
if
true
, then upper-case umlauts-encodings will be decoded, too DEFAULT:false
(i.e. no decoding for upper-case umlauts-encodings)
- Deprecated:
- this is used for the old-style encoding / decoding for umlauts (now masking for ALL unicode chars is used!)
- Returns:
- {String|Object} the String with decoded umlauts-encodings (i.e. with the "original" umlauts).
If the input argument
target
was an Object, the return value will also be an Object, for which the processing stringified Object is converted back usingJSON.parse()
(may lead to errors if umlauts occur in "strange" places within the stringified object).
- Parameters:
- {String|Object} target
- the String for wich all contained umlauts should be replaced with an encoded version.
If this parameter is not a String, it will be converted using
JSON.stringify()
and the resulting String will be processed (may lead to errors if umlauts occur in "strange" places within the stringified object). - {Boolean} doAlsoEncodeUpperCase Optional
- OPTIONAL
if
true
, then upper-case umlauts will be encoded, too DEFAULT:false
(i.e. no encoding for upper-case umlauts)
- Deprecated:
- this is used for the old-style encoding / decoding for umlauts (now masking for ALL unicode chars is used!)
- Returns:
- {String|Object} the String with encoded umlauts.
If the input argument
target
was an Object, the return value will also be an Object, for which the processing stringified Object is converted back usingJSON.parse()
(may lead to errors if umlauts occur in "strange" places within the stringified object).
- Parameters:
- {String} text
- the text String that should be parse.
- {Function} callback Optional
- if #isAsyncExec is TRUE, then executeGrammar will have no return value, but instead the result
of the grammar execution is delivered by the
callback
:function callback(result){ ... }
(see also description ofreturn
value below)
- Returns:
- {Object} the result of the grammar execution:
{phrase: STRING, phrases: OBJECT, semantic: OBJECT}
The propertyphrase
contains thetext
which was matched (with removed stopwords). The propertyphrases
contains the matched TOKENS and UTTERANCES from the JSON definition of the grammar as properties as arrays (e.g. for 1 matched TOKEN "token":{token: ["the matched text"]}
). The returned propertysemantic
depends on the JSON definition of the grammar. NOTE: if #isAsyncExec is TRUE, then there will be no return value, but instead the callback is invoked with the return value.
- Returns:
- {Array
} a copy of the stopword list, from the current JSON grammar (or empty list, if no grammar is present)
- Returns:
- {String} the grammar definition in compiler-specific syntax
- Returns:
- {String} the compiled, JavaScript grammar source code
//remove normal stopwords: var removedStopwordsStr = someStr.replace( gc.getStopWordsRegExpr(), ''); var removedStopwordsStr2 = removedStopwordsStr; if(gc.getStopWordsEncRegExpr()){ //NOTE replace stopwords with spaces (not with empty String as above, ie. with "normal" stopwords) removedStopwordsStr2 = gc.getStopWordsEncRegExpr().replace( gc.getStopWordsEncRegExpr(), ' '); }
\uXXXX
where XXXX is the Unicode HEX number.
SHORTCUT for calling maskString(str, '\\u', '')
.
//for Japanese "下さい" ("please") maskAsUnicode("下さい") -> "\u4E0B\u3055\u3044" //... and using default masking: maskString("下さい") -> "~~4E0B~~~~3055~~~~3044~~"
- Parameters:
- str
~~XXXX~~
where XXXX
is the four digit unicode HEX number.
NOTE that this function is stable with regard to multiple executions: If the function is invoked on the returned String again, the returned String will be the same / unchanged, i.e. maskings (i.e. "~~XXXX~~") will not be masked again.
NOTE: currently, the masking pattern cannot be escaped, i.e. if the original String contains a substring that matches the masking pattern, it cannot be escaped, so that the unmask-function will leave it untouched.
- Parameters:
- {String} str
- the String to process
- {String} prefix Optional
- OPTIONAL
an alternative prefix used for masking, i.e instead of
~~
(ignored, if argument has other type thanstring
) - {String} postfix Optional
- OPTIONAL
an alternative postfix used for masking, i.e instead of
~~
(ignored, if argument has other type thanstring
)
- Returns:
- {String} the masked string
- Parameters:
- {Object} json
- the JSON-like object (i.e. PlainObject)
- {Function} recodeFunc
- the "recoding" function for modifying String values:
must accecpt a String argument and return a String
String recodeFunc(String)
. The function is invoked in context of the GrammarConverter object. Example: this.maskString(). See #maskString.k - {Boolean} isMaskValues Optional
- OPTIONAL
if true, the object's property String values will be processed
NOTE: in case this parameter is specified, then
recodeFunc
must also be specified! DEFAULT: uses property #maskValues - {Boolean} isMaskNames Optional
- OPTIONAL
if true, the property names will be processed
NOTE: in case this parameter is specified, then
recodeFunc
andisMaskValues
must also be specified! DEFAULT: uses property #maskNames
- Returns:
- {Object} the recoded JSON object
- Requires:
- or Array#isArray
- Parameters:
- {String} rawGrammarSyntax
- the grammar definition in compiler-specific syntax
- See:
- #getGrammarDef
callback(result)
The returned result depends on the JSON definition of the grammar:
func(inputText, resultCallback)
- Parameters:
- {Function} func
- the executable grammar function:
func(string, function(object)) : object
- {Boolean} isAsnc Optional
- OPTIONAL set to TRUE, if execution is asynchronously done. DEFAULT: FALSE
- See:
- #exectueGrammar
~~XXXX~~
where XXXX
is the four digit unicode HEX number.
NOTE that this function is stable with regard to multiple executions, IF the original String str did not contain a sub-string that conforms to the encoding pattern (see remark for #maskString): If the function is invoked on the returned String again, the returned String will be the same, i.e. unchanged.
- Parameters:
- {String} str
- {RegExp} detector Optional
- OPTIONAL
an alternative detector-RegExp:
the RegExp must conatin at least one grouping which detects a unicode number (HEX),
e.g. default detector is
~~([0-9|A-F|a-f]{4})~~
(note the grouping for detecting a 4-digit HEX number within the brackets).
- Returns:
- {String} the unmasked string