Translation memory applications are databases which store source texts and their corresponding target texts. These "aligned" translation segments can then be matched during the translation of new texts in an automated fashion.

Alignment

Alignment technology automatically matches source segments with their corresponding translations with the help of statistical and/or lexical algorithms, creating a bi-text that can be imported into a translation memory. This technology is useful to translators to recycle legacy translations, to teachers in translation training, to researchers to feed and train machine translation engines.

Although most traditional CAT tools incorporate an alignment tool, the resulting bi-text is not perfect as the alignment algorithm is usually based mostly on formal criteria of the texts (length, formatting, …). Since recently, the alignment technology has also been used to align the initial source segments with their revised translations.

How can alignment tools be improved? Is there a need for more linguistically based algorithms?

Segmentation and matching algorithms

Segmentation

Texts to be translated are divided into segments. Typically the segmentation is based on punctuation and the occurrence of special characters such as line breaks, tab stops, etc.

What is the optimal size for a segment? What are the rules for segmentation based on? How can information about segmentation and segment-size be exchanged between different tools?

TM Matches

Matches from the Translation Memory are typically marked and ranked with a percentage indicating the correspondence of the segment to be translated with the segments in the Translation Memory. One distinguishes between no-matches, fuzzy matches so-called fuzzy matches (where a considerable part of the segments match), 100% matches (where the segment from the TM is identical to the segment to be translated) and more recently matches that take into account the context of a given segment (context matches or 101% matches).

What matching algorithms are currently being used? How can they be improved? How are matches presented to the users? Are percentages the only way to indicate correspondences?

Subsegment matching

Traditional Translation Memory matches are based on the similarity of the source text with the entire segment. Recently additional matching algorithms have been introduced to find partial matches.

Concordance

The concordancers available in current translation environments are usually bilingual and allow the user to search the translation memory for a particular word or phrase. The search results are then displayed in a Concordance Search window together with their contexts and the source-language word being highlighted.

Term recognition

Traditionally terminology is stored and administered in a database separate from the Translation Memory. Many tools provide real-time access to the contents of the terminology database during translation.

How can the results of the terminology recognition be improved? What part of the information from the terminology database should be displayed? How can other terminology sources be made available? How does the structure of the terminology database influence term recognition?

TM maintenance

Best practices in the maintenance of translation memories

What are industry best practice norms in maintaining TMs?

TM maintenance features

Are the maintenance features available in most TM tools adequate or could they be improved?

Metadata and TM maintenance

How useful are metadata fields to help you maintain TMs?