Monday, April 23, 2012

Database problems in organizing clinical research in aging and other disease/gene databases

 http://www.youtube.com/watch?v=yFXur4tOGuc

She sounds incredibly anxious/terrified but it's probably because she's from Russia and might not speak English that well. The actual papers behind all these talks are unfortunately behind a paywall, so it's not possible to see the actual work they were doing.

The main point though is that there's a ton of data on different aging biomarkers that isn't being used right now clinically. For instance, (and this is all probabilistic since we have an incomplete informational understanding of genes) lets say you have 100 genes that affect some particular aspect of health. Let's say also you have some 10 treatment paths for a particular pathology / disease for that person with the 100 genes. You would have to cross-reference potentially thousands of new studies on all those genes right now to choose the appropriate treatment, because of the interactions between genes and the ENORMOUS variation between individuals on a genetic basis.

So most of that information is apparently just not being used on a regular basis in the medical industry because there is more data than humans can search for. Basically you would want a database of all the genes that affect each portion of health, and because we do all our testing on animals, you would also want cross-relationships between species to identify promising treatment paths for all sorts of variations and then be able to transfer some gain from that information to humans.

In order to identify the overlaps, a new database needs to be created. (They already created a database I believe for a mouse, but the proposed database would be for all species in general.)

Basically, right now the best treatments in genetics rely on computers already to perform the probability calculations based on various studies. This process has an enormous delay time of perhaps 5-10 years and/or a huge bottleneck in translating the latest research into a diagnostic program. This lady's group is proposing a way to automatically take literature and enter it into a database s.t. you can get the most up-to-date treatment option available for your particular genes instead of waiting for a human to manually enter the info into a database.

She made a claim that right now there is quite a lot of clinical studies that are not being used in treatment due to the translation bottleneck / incredible complication of gene networks.

And then also you have the underlying problem of automatically finding / clustering approaches from such a database. Since current databases are hardcoded by technical specialists, there is less data than you would need for a lot of unsupervised ML techniques, so by expanding the field of data you can use all sorts of new unsuper ML stuff.