A language model’s output is evidence of something — but rarely of what the user asked about. It is evidence of the distribution of plausible continuations conditional on the prompt, weighted by the training set and the alignment process. Treating it as evidence of the world requires a separate verification step.
This is not a complaint. It is a property of the artefact, not a flaw. The mistake is institutional: when we publish a finding produced with a model in the loop, we owe the reader a description of how the model participated and how we verified its contribution. Anything less is asking the reader to do the verification themselves, without the context.
LuMiNx’s house rule, for now: any finding that depended on a model is labelled in the methods section, and the prompt and version are recorded in an appendix. Where the model is the subject of the study, this happens automatically. Where the model was an assistant, we have to be more careful — the assistant blends in.