Week 4

Continuing on from last week I researched on how the CSV file containing the translated terminology can be converted to the TBX format we wanted. That is when I came across the Translate Toolkit Project. It contained many useful command line utilities (http://translate-toolkit.readthedocs.org/en/latest/commands/index.html) for our project.

Including a tool called poterminology that extracted terminology from the TMX files from transvision. I experimented with this tool to try and get the terminology extraction accurate. Using another command line utility I was able to convert the CSV file to a TBX file.

The toolkit also contained a Django app called amaGama (http://docs.translatehouse.org/projects/amagama/en/latest/) that used a bilingual terminology file to create a translational memory data store that we could query over a REST API. I used amaGama to build a database containing a terminology by importing the Gaia CSV file. This tool was very important to the progress of the project, since we were not having much luck with our first objective of extracting terminology from the TMX files.
The idea is to build on top of amaGama (http://docs.translatehouse.org/projects/amagama/en/latest/), utilising postgres as the db. The main thing that amaGama provides is a way to search through the terminology database with great speed. We can build a web interface on top of this platform utilising its features.