Segment Decoration
The endpoint that is responsible for decorating a segment (translating a segment) in TermWeb Web Services is the Stemmer Endpoint. You can use two methods in order to get a response, segmentSearch and segmentSearchTemplate. The first one returns a translation by using default configuration and by searching in the user's currently or lastly viewed dictionary. The second one returns a translation by using configuration from a specific template defined in TermWeb.
Since TermWeb version 3.17.8 all target concept and term fields are included in the decoration as long as they have a value. Source term name, term ID and concept ID are also being sent for reference purposes.
Segment Decoration Details
Decoration takes a text segment as a source with a specific source language and 'decorates' it with term translations to another specific target language. Decorated terms are marked with <mrk> XML tags. Different cases and features during decoration are described here:
XML tags in segment
When the input segment contains itself XML tags, like XLIFF tags, then these are ignored as the decoration is concerned. Firstly they are being parsed and removed from the segment. Then the decoration algorithm adds the translated terms to the segment and finally the removed XML tags are being added back to the decorated segment to the appropriate places. If the XML tags come in conflict with the <mrk> tags, then they get split. The <mrk> tags never get split, because one <mrk> tag always represent one translated term and splitting the <mrk> tags would affect the result of the decoration. Split XML tags, especially XLIFF tags with ids, might create inappropriate formats. This is why they should be handled by the user who receives the response.
Example Segment: <g id=123>Terminology Management</g> Software Example Term: Management Software Decoration: <g id=123>Terminology </g><mrk><g id=123>Management<\g> Software</mrk>
Conflicting terms (sharing common words) matching 100%
There are cases where multiple terms sharing common words are eligible to be included in the decoration. In this case not all of them can be included. Since TermWeb version 3.18.0.7 the longest one in characters is prioritized. However, terms matched 100% with the segment have a priority over terms matched through stemming, even if the are shorter in characters, except if api.stem.comparison.margin property is used. Terms with case differences (capitals and small case) are not considered 100% matched.
Example Segment: Terminology Management Software Example Terms: Terminology Management (22 characters), Management Software (19 characters) Decoration: <mrk>Terminology Management</mrk> Software ('Terminology Management' is used instead of 'Management Software', because it is longer in characters)
Conflicting terms (sharing common words) matching through stemming
There are cases where conflicting terms do not match 100% with the segment. In this case since version 3.18.0.12 the first criterium that selects the term for the decoration is the length of the term in number of stems and the second criterium is the resemblance of the stems according to the Levenshtein distance.
Example Segment: Les conseillers RH ainsi que les autres. Example Terms: conseiller RH (1 character difference from 'conseillers RH'), conseil RH (4 characters difference from 'conseillers RH') Decoration: Les <mrk term:sourceTerm="conseiller RH">conseillers RH</mrk> ainsi que les autres.
Stem comparison margin option
Since TermWeb version 3.18.0.11 a new property has been introduced that allows a margin of characters when comparing between stems in cases of non-100% matching cases.
In the following example we are setting the property to 1 character → api.stem.comparison.margin=1
Example Segment: Schwerpunktthemen sind die Reduktionen bei den Listenpreisen. (Stem is Schwerpunktthem) Example Terms: Schwerpunktthema (Stem is Schwerpunktthema) Decoration: <mrk term:sourceTerm="Schwerpunktthema">Schwerpunktthemen</mrk> sind die Reduktionen bei den Listenpreisen. Explanation: Even though stems 'Schwerpunktthem' and 'Schwerpunktthema' do not match, comparison margin of 1 character allows the term to be included in decoration.
Homonyms
Homonyms are terms with exactly the same name, but different meaning. Homonyms are usually being created in different concepts. All homonyms are included in the decoration.
Example Segment: Italian food is good. Example Terms: italian (the adjective), Italian (the language) Decoration: <mrk term:sourceTerm="italian"><mrk term:sourceTerm="Italian"/>Italian</mrk> food is good.
Synonyms
Synonyms are also included in the decoration in the same manner.
Example Segment: Italian food is good. Example Terms: food (eng), Essen (ger), Lebensmittel (ger) Decoration: Italian <mrk term:tgt="Essen"><mrk term:tgt="Lebensmittel"/>food</mrk> is good.
Deprecated terms
There is an option to highlight deprecated/forbidden terms identified by a certain field and value. This field and value is configured in the template. The decoration tag in terms identified as deprecated will contain the attribute term:deprecated="true".
Example Segment: Wireless Network Example Terms: Wireless (eng), WiFi (ger - identified as deprecated) Decoration: <mrk term:tgt="WiFi" term:deprecated="true">Wireless</mrk> Network
Accepted terms
There is an option to highlight accepted terms identified by a certain field and value. This field and value is configured in the template. The decoration tag in terms identified as accepted will contain the attribute term:accepted="true".
Example Segment: Wireless Network Example Terms: Wireless (eng), kabellos (ger - identified as accepted) Decoration: <mrk term:tgt="kabellos" term:accepted="true">Wireless</mrk> Network
Terms with no translation
There is an option to show terms in the source language that do not have a translation, i.e. a term in the target language.
Punctuation marks in segment
In order to search for terms during decoration the segment is being stemmed into words. In this process punctuation marks are not being taken into consideration. Since TermWeb version 3.18.0.7 also apostrophe character is considered as a space. After applying the decoration algorithm all punctuations are being put back in the response in the appropriate places.
Example Segment: Le'chocolat Example Terms: chocolat (fr) Decoration: Le'<mrk>chocolat</mrk>
Term search limit during decoration
After stemming into words, the search algorithm is using only a limit of terms to search for translation. For each word of the segment only the first 200 source terms starting with the same word are being used in the search by default. This limit can change by editing the following TermWeb property:
api.search.limit=<limit>
Note
Until TermWeb version 3.17.8 not all concept/term fields are included in the decoration, but specifically the following:
Source | Field | Returns As | Comments | Date Implemented |
---|---|---|---|---|
Source Term | Source Term Id | sourceTermId | Initial version | |
Source Term | Source Concept Id | sourceConceptId | Initial version | |
Source Term | Source Term (name) | sourceTerm | Initial version | |
Target Term | Target Term Id | id | Initial version | |
Target Term | Target Concept Id | conceptId | It should be the same as Source Concept Id | Initial version |
Target Term | Target Term (name) | tgt | Initial version | |
Target Term | Modified By | modifiedBy | Initial version | |
Target Term | Modified Date | modifiedDate | Initial version | |
Target Term | Client Name | customer | Initial version | |
Target Term | Domain Name | domain | Initial version | |
Target Term | Term Definition | definition | Initial version | |
Target Term | Term Usage Status | status | Initial version | |
Target Term | Term Process Status | processStatus | Initial version | |
Target Term | Term Context | context | Initial version | |
Target Term | Term Reference | reference | Initial version | |
Target Term | Term Remarks | remarks | Initial version | |
Target Term | Indication whether Term is an abbreviation | abbreviation | Initial version | |
Target Term | Indication whether Term is deprecated | deprecated | According to TBX standards and custom template configuration of deprecated field/value | 20-10-2016 (3.17.3) |
Target Term | Indication whether Term is accepted | accepted | According to custom template configuration of accepted field/value | 15-2-2017 (3.17.4) |