For Current Year Details, CLICK HERE

TREC Health Misinformation Track (2021)

Track Introduction

Web search engines are frequently used to help people make decisions about health-related issues. Unfortunately, the web is filled with misinformation regarding the efficacy of treatments for health issues. Search users may not be able to discern correct from incorrect information, nor credible from non-credible sources. As a result of finding misinformation deemed by the user to be useful to their decision making task, they can make incorrect decisions that waste money and put their health at risk.

The TREC Health Misinformation track fosters research on retrieval methods that promote reliable and correct information over misinformation for health-related decision making tasks.

Track communication

Track annoucements are made via google groups.

2021 Track Task

AdHoc Web Retrieval

Task Description: Participants devise search technologies that promote credible and correct information over incorrect information, with the assumption that correct information can better lead people to make correct decisions.

Each topic concerns itself with a health issue and a treatment for that issue. The topics represent a user who is looking for information that is useful for making a decision about whether or not the treatment is helpful or unhelpful for treating the health issue. Better search result rankings will place very useful documents that are credible and correct at the top of ranking, and will not return incorrect information. In this search task, incorrect information is considered harmful and should not be returned.

For each topic, we have chosen a ‘stance’ for the topic on whether the treatment is helpful or unhelpful for the health issue. We do not claim to be providing medical advice, and medical decisions should never be made based on the stance we have chosen. Consult a medical doctor for professional advice. If a treatment is considered ‘helpful’, then correct documents will those construed to be supportive of the treatment and incorrect documents will be those that would disuade the searcher from the treatment. Likewise, an ‘unhelpful’ treatment should return documents that disuade the searcher from using the treatment and should avoid returning documents that are supportive of using the treatment. For each topic, we have included an evidence link for a webpage that the topic author used as the basis for the topic’s stance.

The primary ad-hoc task is to use the topic’s query or description for a topic and return a ranking of documents without use of any of the other topic fields. Manual runs may also make use of the other fields of the topic, e.g. the stance and evidence fields, but these runs will need to declare the use of this data to allow us to distinguish them from the primary task.

Document Collection

This year we will be using the noclean version of the C4 dataset used by Google to train their T5 model. The collection is comprised of text extracts from the April 2019 snapshot of Common Crawl. The Collection contains ~ 1B English documents.

You can download the corpus on a Debian/Ubuntu machine using the following commands (see HuggingFace for further information).

sudo apt-get install git-lfs 
git lfs install
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/allenai/c4
cd c4
git lfs pull --include="en.noclean/c4-train*"

The collection is made up of the 7168 gzipped jsonl files located in the en.nolean directory. We are using only the c4-train.*.json.gz files and not the c4-validation.*.json.gz files. Each file containes ~150k documents, and has one document per line. A document is a json object with the fields text, url and timestamp. As packaged in c4.noclean, documents do not contain a document identifier. For this TREC track we will be adding our own document identifiers to the collection.

The docno spec is as follows:

For documents inside files c4/en.noclean/c4-train.?????-of-07168.json.gz, the docno wil be en.noclean.c4-train.?????-of-07168.<N> where <N> is the line number of the document starting at 0. This goes for all 7168 training files in the path c4/en.noclean/.

So for example, in the file en.noclean/c4-train.01234-of-07168.json.gz the first document’s identifier will be en.noclean.c4-train.01234-of-07168.0, the second document’s identifier will be en.noclean.c4-train.01234-of-07168.1 and the last document’s identifier will be en.noclean.c4-train.01234-of-07168.148409.

One way to insert document identifiers is by using the provided python script. Another would be to name the documents as you index them.

"""
Script to add docnos to files in c4/no.clean
To process all files:
python renamer.py --path <path-to-c4-repo>
To process a subset, e.g. the first 20 files:
python renamer.py --path <path-to-c4-repo> --pattern 000[01]?
"""
import argparse
import glob
import gzip

parser = argparse.ArgumentParser(description='Add docnos to C4 collection.')
parser.add_argument('--path', type=str, help='Root of C4 git repo.', required=True)
parser.add_argument('--pattern', type=str, default="?????", help='File name patterns to process.')
args = parser.parse_args()
pattern = args.pattern
path = args.path


def new_docno(file_number, line_number):
    return f'en.noclean.c4-train.{file_number}-of-07168.{line_number}'


files = sorted(list(glob.iglob(f'{path}/en.noclean/c4-train.{pattern}-of-07168.json.gz')))

for filepath in files:
    with gzip.open(filepath) as f:
        file_number = filepath[-22:-22 + 5]
        file_name = filepath[-31:]
        print(f"adding docnos to file number {file_number} ...")
        with gzip.open(f'{path}/en.noclean.withdocnos/{file_name}', 'wb') as o:
            for line_number, line in enumerate(f.readlines()):
                line = line.decode('utf-8')
                new_line = f"{{\"docno\":\"{new_docno(file_number, line_number)}\",{line[1:]}"
                o.write(new_line.encode('utf-8'))

Topics

Topics are released and available in the active participants part of the TREC website: https://trec.nist.gov/act_part/tracks/misinfo/misinfo-2021-topics.xml

Topics are authored by the track organizers. The NIST assessors will be provided the topic’s query, description, and narrative and be asked to make judgments as per the assessing guidelines.

The topics will be provided as XML files using the following format:

<topics>
  <topic>
    <number>1234</number>
    <query>dexamethasone croup</query>
    <description>Is dexamethasone a good treatment for croup?</description>
    <narrative>Croup is an infection of the upper airway and causes swelling, 
      which obstructs breathing and leads to a barking cough. As one kind of 
      corticosteroids, dexamethasone can weaken the immune response and 
      therefore mitigate symptoms such as swelling. A very useful document 
      would discuss the effectiveness of dexamethasone for croup, i.e. a very 
      useful document specifically addresses or answers the search topic's 
      question. A useful document would provide information that would help 
      a user make a decision about treating croup with dexamethasone, and 
      may discuss either separately or jointly: croup, recommended treatments 
      for croup, the pros and cons of dexamethasone, etc.</narrative>
    <disclaimer>We do not claim to be providing medical advice, and medical 
      decisions should never be made based on the stance we have chosen.  
      Consult a medical doctor for professional advice.</disclaimer>
    <stance>helpful</stance>
    <evidence>https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5804741/</evidence>
  </topic>
<topic>
...
</topic>
</topics>

Evaluation

The final qrels will contain assessments with respect to the following criteria:

Usefulness: The extent to which the document contains information that a search user would find useful in answering the topic’s question. Three levels: 0. Not useful. 1. Useful: User would find the document useful for the topic because it would help the user make a decision about the search topic by providing information about the health issue, the treatment, or both. Merely mentioning the health issue or the treatment is not sufficient. To be useful, the user needs to find the document helps in some way make a decision about the topic. 2. Very useful: User finds the document very useful because it specifically talks about the use of the treatment for the health issue, or provides strong guidance about the health treatment regardless of the health issue.
Supportiveness: Does the document support or dissuade the use of the treatment in the topic’s question? Three levels: 1. Supportive: The document would support a decision to use the treatment. 0. Neutral: The document neither supports or dissuades use of the treatment. -1. Dissuades: The document would dissuade a user to not use the treatment.
Credibility: whether the document is considered credible by the assessor. 0. Not credible. 1. Credible.

Note: not-useful documents will not be assessed with respect to credibility and supportiveness.

Submitted runs will be evaluated with respect to the three criteria: usefulness, correctness, and credibility. We will be using the compatibility measure in a similar fashion as we did in the 2020 version of this track (See overview paper for more details).

For more details, you can read the 2021 assessing guidlines as used by NIST.

Runs

Participating groups will be allowed to submit as many runs as they like, but they need authorization from the Track organizers before submitting more than 10 runs. Not all runs are likely to be used for pooling and groups will need to specify a preference ordering for pooling purposes.

Runs may be either automatic or manual.

Automatic runs: Only the topic’s query or description field may be used for automatic runs. Use of the other fields, e.g. stance, evidence, and narrative, will make a run a manual run. An automatic run is made without any tuning or customization based on the topics. Best practice for an automatic run is to avoid using the topics or even looking at them until all decisions and code have been written to produce the automatic run.

Manual runs: A manual run is anything that is not an automatic run. Manual runs commonly have some human input based on the topics, e.g., hand-crafted queries or relevance feedback. All topic fields may be used for manual runs. We encourage manual runs in addition to automatic runs.

Submission format for runs will follow the standard TREC run format. For each topic, please return 1,000 ranked documents. The standard TREC run format is as follows:

qid Q0 docno rank score tag

where:

qid: the topic number;
Q0: unused and should always be Q0;
docno: the document id number returned by your system for the topic qid (See above for more details on how docnos should be constructed it);
ran: the rank the document is retrieved;
score: the score (integer or floating point) that generated the ranking. The score must be in descending (non-increasing) order. The score is important to handle tied scores. (trec_eval sorts documents by the specified scores values and not your ranks values);
tag: a tag that uniquely identifies your group AND the method you used to produce the run. Each run should have a different tag.

The fields should be separated with a space.

An example run is shown below:

Q0 en.noclean.c4-train.04124-of-07168.69102 1 14.8928003311 myGroupNameMyMethodName
Q0 en.noclean.c4-train.03346-of-07168.52165 2 14.7590999603 myGroupNameMyMethodName
Q0 en.noclean.c4-train.03904-of-07168.54203 3 14.5707998276 myGroupNameMyMethodName
...

Other Track Tasks

This year, we will only have the ad-hoc retrieval task. We will consider adding back in the other tasks in future years.

Schedule

~~June 21, 2021~~ Collection released;
July 14 2021 Topics released;
September 2, 2021 Runs due;
End of September 2021 Results returned;
October 2021 Notebook paper due;
November 17-19, 2021 TREC Conference;
February 2022 Final report due.

2022 Track

2021 Track

2020 Track

2019 Track