Data FAQ
Find below common data related questions. Remember that you can report issues and submit your questions using the "Send a Message" button in SCOUT.
Why do you show broken/dead news links?
Broken news links are due to a data source changing their web address or no longer showing a published article. Older articles might have broken links. However, we still use this historical data to inform our processes and customers, even if the original publisher no longer does.
We will show them differently in the near future so that they are easier to identify.
Why are there organization duplicates?
This is an area we’re continually analyzing and improving, as this issue is exacerbated simply by the huge scope of data collection that we perform in not only every industry but also every country. For example, depending on the industry, country, or even just the city that you’re in, the company “ABC” might mean something different. It might be “ABC Studios”, “Associated Builders and Contractors”, “ABC Compounding”, “ABC Technology Training & Upskilling”, “Automated Batting Cages”, etc. Although these organizations usually have a unique legal name in their country, they may often refer to themselves or be talked about in the public by the acronym “ABC”. Even in such cases, we work on matching duplicate data and merging the information into a single record while avoiding incorrect deduplication.
Publication KPIs (Overview Dashboard) don't match Publication panels.
It’s possible to find some inconsistencies between the Overview panel and the Publication panels. There are differences in how the Publication panels filter the documents compared to how the Overview panel filters them (for example, the Overview panel may take into account publications in other languages that aren’t shown in Publication panels). However, these inconsistencies should be small.
An organization is mentioned in publications, but I can’t find it under Peers and Partners.
The web results are retrieved ad-hoc and are in no relation with our InnovationGraph. Extracting entities such as companies from unstructured text is one of the high-priority tasks our Data science team is working on in order to design new solutions. Unfortunately, absolute consistency between web results and the content of other panels cannot yet be expected. Note that the footprint of a company can always be evaluated using an Organization search.
I can’t find an organization but the keywords I used are on their website.
Although we do collect a lot of data from organizations' websites, we do not due this for every organization. However, this is an area we’re expanding in, and we are continually collecting more data from the web.
How can I suggest additional news sources?
You can suggest news sources using the Send a Message button in SCOUT. We will evaluate them as soon as possible.
Integration of non-English sources.
While we do have non-English documents in the system that are used for some analytical purposes, we only display English publications in SCOUT. Sometimes we show non-English references on pure player queries. That’s why you can sometimes see non-English documents in the reference section. That’s because the references might otherwise be empty because the player in the ranking had been identified via “silently“ matching non-English documents.
Why are there no documents under references?
There are various reasons for this. Sometimes we have documents in the system which we integrate in our analytical processes so that we can give good indicators for a company’s impact, relevance, and global score. Therefore, it can happen that we have thousands of documents for an organization, but those documents might not all be in English. In SCOUT, we show documents which are in English.
Another reason for this can be that correctly identifying organizations mentioned in a document is a complex process. This is partly due to various names that an organization might go by. For example, Mercedes Benz has hundreds of organizations under the umbrella of Mercedes Benz Group. Even in such cases, we continually improve our processes to improve how we link documents.
Well-known company tagged as a startup.
This happens sometimes when we have duplicates, which happens because we integrate data from hundreds of thousands of data sources. It can be that for some of the duplicates we do not have the information that shows a violation of our startup criteria.
For example, we might have Mercedes-Benz not classified as a startup, but then we get data for organizational name Mercedes-Benz India. This duplicate record by a different name might not initially be matched to our up to date record for Mercedes-Benz, and the data we have for Mercedes-Benz India might appear to meet the classification for a startup. Therefore, Mercedes-Benz might have some records appearing as a startup while others are as a company only.
These organizations are matched and deduplicated over time, and with this the startup classification is changed when more authoritative data overwrites the missing or bad data and indicates a violation of our startup criteria.
Missing revenue and employee information.
We strive to show as many organizations as possible from hundreds of thousands of data sources. Depending on the data sources we have for an organization, we might not have the revenue or employee information. We strive to find and integrate this information over time. Even when we don’t have revenue or employee information, we can still link documents and show a good indicator of how relevant that organization is to any given technological area that they are involved in.
Same organization under companies and startups.
We consider startups a sub-set of companies, that’s why it’s possible to find them in both panels.