Open Materials 2024 will be one of the biggest data sets available for materials science. Meta is releasing a massive data set and models, called Open Materials 2024, that could help scientists use AI ...
Personally identifiable information has been found in DataComp CommonPool, one of the largest open-source data sets used to train image generation models. Millions of images of passports, credit cards ...
The largest single cell perturbation dataset to-date will be generated and released open source in a new team effort.
FHIBE was created to address issues with current publicly available datasets that lack diversity and are collected without consent, which can perpetuate bias and present a persistent challenge to AI ...
Penn Medicine links EHR with real-time care data to study clinician-patient interactions and improve clinical practice using multimodal datasets.
Using Google Earth imagery and 2019-2022 Sentinel-2 datasets, Chinese scientists have developed a two-stage classification framework to obtain the annual global dataset of solar photovoltaic panels at ...