Study identifies limitations of data crowdsourcing in med research

Source: Xinhua| 2017-06-26 05:07:05|Editor: Mu Xuequan
Video PlayerClose

SAN FRANCISCO, June 25 (Xinhua) -- A study led by the University of California, San Francisco, comparing the online data crowdsourcing approach to a standard telephone survey, has found that certain crowdsourced medical study groups are either over- or underrepresented by age, race/ethnicity, education and physical activity.

While crowdsourcing, a practice that enables study participants to submit data electronically, has grown in use for health and medical research, the study published online this week in the American Journal of Public Health suggests that greater attention needs to be given to determining what populations are and are not reachable using remote, electronic data collection platforms.

The growth of Internet-based sampling and data collection offers an opportunity for cheaper, higher volume collection of health-related data. However, the resulting data may not be generalizable, and if certain groups are over- or underrepresented, the research may generate misleading conclusions when extrapolated to larger populations. Therefore, studies relying on crowdsourced respondents need to define the profile of the people generating those data.

"Online crowdsourced recruitment leads to systematic underrepresentation of some U.S. adults, such as certain racial and ethnic minorities, those with lower educational attainment, and older adults, and overrepresentation of others," lead author Veronica Yank, assistant professor of medicine at UCSF, was quoted as saying in a news release.

In addition, how participants enter a particular study can influence generalizability of results. Population studies traditionally have spent considerable effort on targeted recruitment of representative samples and statistical adjustments for over- or underrepresentation of subgroups among those enrolled, allowing investigators to determine the degree of confidence about the representativeness of the data. So, efforts are needed to understand and promote inclusion of underrepresented groups within projects using crowdsourced recruitment and data collection.

Crowdsourcing, in which self-selected individuals provide electronic data or feedback, is currently one of the most innovative methods for studying population accrual. Social science and psychology researchers widely use it, and the U.S. National Institutes of Health (NIH) Precision Medicine Initiative will recruit participants this year through the Internet, social media and mobile technologies to form an "All of Us" cohort, the largest study group ever undertaken.

In the UCSF-led study, Yank and her colleagues utilized Amazon Mechanical Turk (MTurk), the world's largest online crowdsourcing platform, with 500,000 registered anonymous members overall, including about 400,000 in the United States and 15,000 active on any given day, to compare demographic and health characteristics of adults recruited through it to those of the U.S. population. For this study, 2,015 U.S.-based adults at least 18 years old completed the survey between July-August 2015.

The researchers focused on health characteristics that are known risk factors for cardiovascular disease, many of which are suitable for remote measurement and data collection.

For comparison, they used 2013 data of 428,211 respondents from the Centers for Disease Control and Prevention (CDC) Behavioral Risk Factor Surveillance System (BRFSS), the world's largest telephone health survey.

Administered annually in English by landline or cell phone to adult U.S. residents, the survey gathers cross-sectional data on demographic and health characteristics and disease risk factors. The selected questions focused on demographics, ethnicity, educational attainment, annual income, employment status, and individual characteristics known to influence cardiovascular morbidity and mortality.

Overall, compared to the BRFSS, the crowdsourced samples tended to be overrepresented in the 20-39 age range and underrepresented in the 40-75 age range, and the cardiovascular disease risk profile of crowdsourced participants also differed in well-defined ways from the U.S. population. Crowdsourced participants were younger, more likely to be non-Hispanic and white, and had higher levels of educational attainment. Those age 40-59 were most representative with regard to smoking, diabetes, hypertension and hyperlipidemia, but even they had significant differences with regard to race/ethnicity, education and physical activity.

Crowdsourced data from younger age groups were even less similar, and those age 60 and older were difficult to reach by crowdsourcing.

"These findings have implications for the upcoming national Precision Medicine Initiative, which will use online crowdsourcing as one of its recruitment and data collection approaches for the million Americans it plans to enroll in its cohort," Yank noted, saying that policymakers, funders of research and researchers should be explicit about the advantages and limitations of relying on crowdsourced data, especially when underlying sociodemographic characteristics or health variables may influence health outcomes.

TOP STORIES
EDITOR’S CHOICE
MOST VIEWED
EXPLORE XINHUANET
010020070750000000000000011105091363940041