I had began investigating amaGama and reading the source code for the project. I had used amaGama to build the database of the terms we had collected using the following commands:
amaGama exposed a REST API that allowed passing a query in a urlencoded format. The string would be queried against the database and returns the result in a json data structure. amaGama used the levenshtein distance to rank the results, this cause some queries to be omitted from the results.
amaGama is a flask based project and we had decided early on that a Python based framework like Django or Flask would be ideal for building the web interface on. Therefore, I downloaded the source and started extending amaGama.
After setting up the project locally I looked into how amaGama stores the terminology in the database. Each terminology is split into a vector of the stemmed words.
When a word or segment is queries through the REST API, it searches through these vectors for matches and returns the translated equivalent through the lookup of a foreign key.
amaGamma was not made for searching through terminology though, therefore some queries did not return any results even through there was a clear match based on what is contained in the database. I modified the query function by removing the levenshtein distance and ranking metric, this made the query more generic and therefore returning more results. This gave us a simple base to start modifying the SQL query to our needs.
While the REST API needed improvement for now, I was happy with the results returned by the API. I started to put together the web interface to try out the results.