Week 6 - Translating web pages

Last week I worked on connecting the API and frontend interface together to test our prototype. So far the frontend interface has a text area for input similar to Google Translate, where the user can enter a string and press the translate button to view the translated string in the text box to the right.

This week I am focusing on translating a full web page. To accomplish this I will load the URL DOM contents in an IFrame and then translate the text contents of the web page through Javascript. Google Translate uses the same approach because loading a remote URL through an IFrame disallows any Javascript being executed due to the Same Origin policy. Therefore, the target web page needs to be requested through the server and served within the same domain as the IFrame.

One of the project aims is to use the Mozilla support sites as a testbench for how well our translation engine is working. Using python libraries such as requests, lxml and re I obtained the raw contents of the response, stripped the script tags and added a base tag to get relative URL’s. The script tags of the remote URL had to be stripped to avoid any Javascript being executed outside the IFrame.

Through Javascript I was able to query the amaGama database with the text contents of the page and obtain a JSON array of any translation matches. Using string replacement functions within Javascript I translated the text of the document dynamically.