Our rating system
For each of the AI-powered tools that are featured on these information pages we provide ratings in three categories:
For each of these categories the tools have been thoroughly tested and evaluated, and the ratings will be updated should the tool or our standards significantly change. Here we will provide the context and reasoning for each of the rating categories, and some recommendations for the interpretation of the ratings.
Only the free features of the tools are evaluated in these tests, unless otherwise indicated.
Accuracy and Quality
Most AI-powered tools are operated using a prompt-system, meaning that the user has to supply the instructions for the model to follow. How the model interprets these instructions and how accurately it follows them is part of the evaluation for this rating. This includes, for example, the interpretation of concepts such as "left" and "right" when it comes to image generators, the understanding of more complex and field specific definitions and terminology for LLMs, and the recognition of complex sentences and words during the transcription of audio files.
We have combined this rating with an assessment of the quality, which can include the inclusion of the source of information in the output of LLMs, reduced blurriness or deformations in images, or the types of journals selected in literature search engines.
A rating of two or fewer stars indicates the model may not produce reliable or trustworthy output and may require more critical reflection or more guidance by the user than is required for comparable tools.
Flexibility and Features
Each tool has its own strengths and weaknesses. Some LLMs provide excellent support for coding, some have access to the internet so they can include more recent information in their answers, and others allow for the interpretation of images and/or sounds in addition to text. The number of unique and/or relevant features in a model determines a large part of this rating. For open-source models this could include variations of the models as well (such as different sizes or supported languages). For commercial models this also includes the possibility to disable these features when they are undesired. When features or even core functionalities of a tool are locked behind a paywall the rating is lowered.
Data Security and Privacy
As Wageningen University we are legally obligated (both students and employees to ensure no sensitive data is shared with parties that should not have access to such information. Commercial GenAI models may save the data we enter into a model for the training of new versions of those models. Some AI models tell they do not use user inputs as training data. Still, should be very careful with what data we enter into these models
To learn more about responsible handling of data and Artificial Intelligence, please visit the “How to use AI responsibly” page.
Commercial GenAI models vary greatly in what extent they save user data. Some models publicize user inputs and outputs, whilst others explicitly state they do not store user data. Moreover, some models are very transparent on their data handling, while others provide very little information on this topic.
As we don’t have agreements or licences with any commercial AI models at WUR, this rating depends completely on the information provided by developers. For that reason, our ratings should merely be used as an indication, and you should make an independent and well-considered choice. We strongly advise you to do your own research into the data security of the tool you intend to use. When in doubt, you can also contact the Privacy Officer or Information Security officer of your Science Group.
Click here to learn more about information security and privacy at WUR.
In the end, the ultimate judgment and responsibility lie with you, the user of these tools. In our ratings, we have taken the following points into consideration:
- Where does the AI model run? Is the AI model run on external servers, or can you run it locally? Open-source models which are run on your own device (and therefore don't send data to external parties). If the model is an online model, the location of its servers influences the data security. EU-hosted models have to comply with the more strict GDPR, whereas models hosted outside the EU less strictly regulated. An open-source model option or an EU-hosted model will receive a higher rating.
- Do you have to make an account or not? Some tools force you to share personal data as you have to sign up by creating an account. This poses more privacy issues than models where this is not obligatory. Not needing to log in with an account to use the model will result in a higher rating.
- How is your data used? Some models explicitly don’t save your inserted data, or allow you to turn this feature on and off. Others actively store your data for the improvement of the models. In more extreme cases, your inputs are shared with companies or other users. If there is no training on your data, or if this is turned off by default and give the user control over this option, the rating will be higher.
- Lastly, it is important to consider if the model respects intellectual property and copyrighted material. Models that are more transparent about their training data will have a higher rating
Always be cautious with the information you share with AI models. Click here to read more about responsible AI usage.