Key lessons in using GenAI to enhance tourism and visitor attractions

Chloe McAree

At Hamilton Robson we have been working with various visitor attractions around the world for for ten years now. Whether historical sites, museums, or theme parks, they all attract a diverse, international audience. This diversity can lead to challenges:

  • Language Barriers: Many attractions provide information only in the local language, leaving non-native speakers unable to fully appreciate the experience
  • Accessibility Issues: Traditional guides are often inaccessible to the blind or visually impaired. This exclusion prevents them from enjoying the full richness of the attraction

The Vision

When witnessing these issues first hand, we recognised the need to break down language and accessibility barriers to ensure these attractions could be inclusive for all. We saw a growing demand for solutions that could make attractions more inclusive, ensuring that every visitor, regardless of language or ability, can have an equally enriching experience.

For this we developed the Marvel Application, our vision was clear: every piece of text needed a corresponding audio component, making the content accessible to both blind and deaf guests. By doing so, we aimed to enable attractions to share captivating stories, bringing history to life and creating a lasting impact. To truly deliver value for all visitors, we also wanted to expand our offerings to as many languages as possible.

 

The Challenges

To approach this, we needed to hire language specialists to translate the scripts. We also required a variety of voice actors to provide a mix of voices for storytelling (e.g., young, old, etc.), and we had to find voice actors for each language offered.

The challenge was that mistakes at any stage of this process were hard to detect and expensive to fix. Correcting errors required re-hiring language experts to translate script changes into all the necessary languages, coordinating with multiple voice actors, and, in cases where the original actor was unavailable, arranging a full re-recording. Additionally, audio specialists were needed to edit the updated audio files. As a result, many clients preferred to tolerate minor content issues or allow digital content to lag behind the physical experience.

These traditional translation methods were time-consuming and often cost-prohibitive for many of our clients, such as museums who didn’t have the budget for such a big undertaken. However, with recent advances in generative AI, could we offer customers an alternative to post-production fixes? …and maybe even more?

Benchmarking Gen-AI Services

When deciding on AI services to use, benchmarking for your specific use case is essential. This is because AI performance can vary significantly depending on the context and requirements of the task.

We began with two extensive lists of AI services: one for translation services and one for text-to-speech services. Our goal was to narrow down these services based on risk, cost, and quality.

Risk assessment of the tools included evaluating factors such as known issues, performance, scalability, security, and compliance. After eliminating some services through risk analysis, we moved on to cost analysis, carefully examining the pricing tiers offered by each service to determine their feasibility for large-scale use.

With the list narrowed down by risk and cost, I focused on benchmarking the remaining services to compare their performance for our specific use case. The benchmarking exercises differed between translation services and text-to-speech services.

Starting with translation services, for benchmarking, we considered several key factors:

  1. Language Variety: Ensuring the service could handle all the languages required by our clients.
  2. Compatibility and Integration: Assessing whether the service offered APIs or SDKs that aligned with our workflow.
  3. Accuracy and Consistency: Testing the quality of the translations, focusing on whether the service-maintained context and meaning.

We enlisted several human translators specialising in different languages to create manual translations of an English script into their respective languages. These human translations served as a benchmark for evaluating the AI tools.

Next, we uploaded the untranslated scripts to various translation tools and language models (LLMs) to see how the AI services performed.

We collected all the results from the AI translation services and had the human language specialists compare the outputs with the original human translations.

Here you can see some of the translations and comparisons from our translators, where they had identified differences across the machine translations.

The examination revealed instances where certain services struggled with maintaining the correct tense or capturing context accurately or preserving the intended focus of the sentences.

For instance, one script contained the sentence:
“Some sailors believed a figurehead would protect the crew and ward off evil spirits.”

One tool translated “believed” as “estimated,” which drastically altered the meaning. This change weakened the superstitious tone of the sailors’ belief, making it sound like they were guessing or calculating rather than holding a deeply rooted conviction.

This is just one example, but such insights provided a clear understanding of the strengths and weaknesses of each tool.

After analysing all the samples, the human specialists recommended the best-performing translation services. These recommendations were based on accuracy, consistency, and alignment with the intended meaning and tone of the source material.

Finding the right translation service

When it came to audio benchmarking, the exercises were a little different. We needed to evaluate several key aspects, including:

  • Voice Quality: Assessing the quality and naturalness of the generated voices. This involved listening to sample outputs to determine if the voices were clear, expressive, and suitable for our applications.
  • Variety of Voices: Ensuring there were options for different accents, genders, and age groups to suit the diverse attractions we work with.
  • Language Support: Verifying that the services could produce audio in all the required languages.
  • Emotional Range: Evaluating the ability to convey different emotions through speech, ensuring a variety of emotional tones for specific applications.

 

We began by uploading several scripts to different services. For each service, we exported audio in multiple languages. When possible, we also generated versions featuring different genders, age groups, and accents.

We quickly were able to eliminate several of the services as they failed to support all the languages required by our customers.

For the remaining services, we focused on quality, clarity, emotional expressiveness, and voice variety. To evaluate these, we took multiple samples, anonymised them, and played them for different groups of individuals. The participants were then asked to rank the samples based on their perceived quality and humanity.

While testing text-to-speech capabilities across languages, we discovered that some services performed well in English but struggled with other languages. In some cases, to achieve acceptable results for non-English languages, we had to write the text phonetically, which added complexity to the process.

The Impact

With our AI services carefully chosen after a long period of benchmarking and testing, we collaborated with an existing customer to launch a new language for their tour. This involved translating all the text for visitors to read and generating all the voices in the new language.

Overall, our Marvel product continues to offer customers the option to request either human translations or AI-generated translations. Since its launch, we have received truly amazing feedback from our clients:

  • One client reported receiving four times as many international guests since introducing their multi-language tour guide app.
  • Another mentioned selling five times more audio guides compared to paper guides after adding multi-language support!
  • This year we have also seen several clients say that multiple languages have been a strict condition of the development funding they will receive, and that expensive translations will mean dropping key features.

This feedback truly highlights the positive impact that generative AI can have in enriching user experience for diverse groups.

Hamilton Robson’s is committed to delivering impact with Research and Development and since the results shown have developed a set of innovative processes and tools to significantly improve AI translation errors.

If you’re interested in developing an audio tour for your visitor attraction or how to increase your content accessibility using AI, get in touch! We’d love to chat about your requirements or ideas. 

LETS TALK.

Want to find out how the subject of this blog could help your business? 

Our blended team of experts go over and above with our services to our customers, no matter what the challenge. Get in touch to find out how we can work together.