How to document your GenAI use?
Part of being a good scientist means that your research has to be reproducible. If a result cannot be replicated, it holds little to no value in the scientific community. The same applies to the usage of AI in your research. Others should be able to replicate your actions to the best of their abilities using the information you provide. To achieve this the following steps should be taken:
- You disclose your usage of AI tools and the way in which they were used in your Materials and Methods section. This includes technical details about the models and their method of access as well.
- You provide as much relevant information as possible about the way in which AI was used, including any prompts written and output received.
- You maintain a copy of the information provided to AI models and the output received in your data repository.
- You add a statement at the end of your document (typically above the References) officially declaring your use of AI and stating you take full responsibility for the contents of the document written. (See example below for more information.)
The statement below is an example of how you can declare your AI use at the end of a document. This is taken from the Elsevier publisher for academic journals:
During the preparation of this work the author(s) used [NAME TOOL / SERVICE] in order to [REASON]. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the publication.
In the paragraphs below we will provide examples of how you can document your usage for various types of tools and levels of detail. When in doubt whether to declare the usage of AI, always assume a declaration and inclusion in your Materials and Methods is needed.
Disclosing in Materials and Methods
In the Materials and Methods (M&M) section of your document it is convenient to add a single paragraph at the end of your M&M to explain the usage of AI tools in your work. This would apply to all AI tools, ranging from the checking of sentence formulation or spelling with language models or grammar tools, to the generation of images or the searching for literature. The following information should be included in such a declaration:
- The name of the tool used.
- The developer of the tool (if applicable).
- What type of model is this? / What was the model used for?
- When did you use this / What version did you use?
- The location where the tool can be accessed (URL for a web interface or downloadable location).
For the usage of ChatGPT the following can be used as an example:
Disclosing use of images
When using AI-generated images, the source of the image should be included in the caption of the image, just like you would do for real images. However, for AI-generated images this information may require more extensive disclosure than expected. We will provide two examples of disclosure here, one for web-based commercial image generators, and one for locally installed open-source generators.
Web-based image generators
As online image generators are often quite restrictive in the settings available to the user, the precise information that can be provided is limited. This also prevents the exact replication of the images generated (contrary to locally installed models). To be as transparent as possible, the following information should be included in the caption:
- The tool used and the web location.
- The version of the model or the date of image generation.
- The prompt provided to the model.
- (If applicable) the prompt used by the model.
The prompt provided as user and the prompt used by the model may differ depending on whether there is a language model operating in-between to rewrite the prompt to a more extensive description. This happens when generating images within ChatGPT (using DALL-E3) or when the Magic Prompt feature is activated in Ideogram, for example.
For the image below we have used Ideogram with Magic Prompt enabled to provide an extensive example of a complete caption.
Figure. Image created with Ideogram (https://ideogram.ai/), Model version: 2.0.
User prompt: A collection of various fruits and vegetables which are commonly grown in the Netherlands, lying on a kitchen table, realistic, bright ambient lighting.
Image prompt: A photo of a collection of various fruits and vegetables commonly grown in the Netherlands, lying on a kitchen table. There are apples, oranges, carrots, potatoes, beets, onions, and leeks. The fruits and vegetables are fresh and have a glossy texture. The kitchen table has a wooden surface and is placed in a room with a bright ambient lighting.
Locally installed (open-source) image generators
When installing an image generator on your own device you are often given a larger array of settings to play with. These types of models can include the possibility to create both positive and negative prompts, and the option to specify a ‘seed’ (see guide on Image Generators for more information), and also may have more components in their workflow (such as ‘upscaling’ models). As the caption would become too long if the full workflow is put in the caption, only the essential information for a single image should be in the caption, and the remainder of the information may be put in the Appendix of the document. This would translate to the following requirements:
In the caption:
- The name of the model and the corresponding version / checkpoint.
- The positive and negative prompt
- The seed number.
In the appendix:
- The location where the model can be downloaded from.
- The user interface for the workflow.
- The tensor clips.
- The sampler, steps, cfg, scheduler, denoising and timestep settings. (Amongst others.)
- Any upscaling model or other additions to the workflow other than the model checkpoint.
It is recommended to also save a copy of the workflow used in your data repository, so the settings mentioned in the appendix are archived.
For the image below we have used Stable Diffusion 3 and will show the information needed for the caption.
Figure. Image created with Stable Diffusion 3. Model version: 3.0 Medium. Seed: 21.
Positive prompt: a sunset on a white beach overlooking the sea, flowing waves, photorealistic, lightly cloudy skies, palm trees dotting the beach, beautiful, high quality, uniform sandy beach structure
Negative prompt: bad quality, poor quality, clouds, rain, moon, snow, cartoon, animated, sharp transition, mixed sand quality
Documenting your conversations / prompts
Especially when using language models it is important that you save the inputs and outputs of the conversation in your data repository. You may want to add a copy of some of these (if relevant) to your appendix to be transparent about your usage of AI. These conversations can often be exported to a Word or PDF-document and saved, though some models also do so via JSON-files (is allowed, but less easily readable) or via web-links. The usage of web-links to save your conversations is not recommended, as the link becomes inactive once the conversation is deleted on the original account.