The data release is a way to integrate our partner catalogues into our system, without resulting in inaccurate data/duplicate properties
As you can imagine, we have many partners who include a variety of catalogues in different formats with slightly different data. We therefore need a way to consolidate this in order to provide effective price comparison for the traveler. We need to ensure that any hotels shared between catalogues show as only one hotel on our site. This is what the data release is responsible for.
Prior to integrating, we require that our partner's catalogue be downloadable at any time so that we may automate the inventory and ensure that it is up-to-date at all times. Once successfully automated, we strive to run a data release twice a week, dependent on any issues encountered. Previously, our data release ran every 2 weeks. This new expedited timeline ensures more accurate and relevant results.
This data release process is sometimes referred to as de-duplication. The de-duplication/matching process works (on a high level) like this:
- Hotels from all of our partner catalogues are grouped together using their geographic coordinates i.e Latitude/longitude.
- This tells us which hotels are very close to each other, and which are likely to be the “same hotel”.
- The identifying attributes, such as name and street address, are then compared to determine how confident we are that hotels in this group are actually the same hotel.
This process involves the downloading, validation, mapping, and a recheck of your catalogue before changes can be published. If you’ve made a change to your catalogue and don’t see the changes reflected live on Skyscanner, please refer to our missing properties section. If you have checked the missing properties section and still have outstanding queries, please submit a request.
Article is closed for comments.