Semantic Search - For Reals? - Programming, developer relations, and general nonsense from Sean Falconer

Facebook X Reddit LinkedIn

Recently a project I am involved in at Stanford participated and won the Semantic Web Challenge at the International Semantic Web Conference. The Semantic Web Challenge is a competition for Semantic Web applications. For the uninformed, the Semantic Web “is a group of methods and technologies to allow machines to understand the meaning – or ‘semantics’ – of information on the World Wide Web”. Essentially, the Semantic Web hopes to move web applications beyond simple syntax to actually supporting the “meaning” behind content.

In the challenge, the focus is not so much on the technical or research aspects of the tools, but more on their usefulness for target users. The challenge helps demonstrate what semantic technologies can bring to society.

Our application, NCBO Resource Index: Ontology – Based Search and Mining of Biomedical Resources, is a semantic search application for biomedical researchers (check out the video below). I designed and developed the user experience and interface.

The application allows researchers in the biomedical sciences to perform “concept-based search” over 22 different biomedical resource databases. Thus, rather than a researcher typing in keywords, the researcher would type in concepts from their domain of interest, those concepts come from ontologies, the backbone of the Semantic Web. For the sake of simplicity, an ontology is a structured terminology that describes the terms within a specific domain, the properties of those terms and the relationships between them.

In terms of user interaction and searching, the behavior is quite similar to conventional search engines like Google. The difference being that when you select a term from the auto-completion drop down, that term comes from an ontology that was used to index elements within the various biomedical resources. I also included helpful tag clouds (see figure below) for visualizing related concepts to your current search query and I use color intensity to represent more relevant resources based on your current search terms.

The power of the search tool comes from how the index gets constructed. The National Center for Biomedical Ontology maintains a project called BioPortal. This project is an open library of more than 200 ontologies in biomedicine. Using the terms from these ontologies as our term-base, we automatically annotate or “tag” textual descriptions of the data residing within the elements of the 22 biomedical resources. These could be things like patient records, gene expression data, research articles, clinical trials, etc.

These annotations act as a link connecting an ontology term to a data element. The really useful part of this process and also where the semantics begin to play a role, is that we can use the structure of the ontology to expand these annotations. So, rather than getting a simple keyword like “breast cancer” mapping to a particular clinical trial, we also know any synonyms of “breast cancer” described in the ontology. We can also use the hierarchy of the ontology to link more general or specific terms to the resource like “cancer” or “melanoma”. Finally, we can use mappings between multiple ontologies to discover other related terms.

This is truly where the power of ontologies can be seen and where a semantic approach to search can be very useful. For example, searching for the keywords “retroperitoneal neoplasm” within the Gene Expression Omnibus website will return zero results. However, the same search in our tool will retrieve relevant results annotated by the child term of “retroperitoneal neoplasm” from the NCI Thesaurus, “pheochromocytoma”. Results are scored based on the distance of the matching annotation from a given search concept.

Another big advantage to our application is that the ontologies form a “semantic bridge” between very different biomedical resources. One search execution automatically gives you access to 22 different databases, allowing a researcher to explore relationships between things like gene expressions and clinical trials relevant to a specific concept. Without this, researchers are forced to open up multiple web pages and search each database independently.

Of course all these annotations take a long time to index and it results in a boat load of data. The current index is stored in a 1.5 terabyte MySQL database that contains 16.4 billion annotations, 2.4 million ontology terms, and 3.5 million data elements (stylized graphic below). Other members of the team have worked to figure out clever ways to do a lot of this indexing rather efficiently. You can read about it in Paea Lependu’s paper Optimize First, Buy Later: Analyzing Metrics to Ramp-up Very Large Knowledge Bases.

We are working to include more resources in the index and also speed up the search support. If you’re interested in playing with the application, check it out in the BioPortal integration.

Unknown says:

December 31, 2015 at 9:48 am

Awesome post share I say thanks to share this impressive post.Keep it.En ucuz kitap this are very best service .
Unknown says:

January 1, 2016 at 10:10 am

Wha! This is great full blog i like this type blog. Deferentially this blog have this quality big cost, special effects, thank for sharing this blog.Orange Auto provides all the loving care that your tires will need to keep your vehicle on the road. These services include tires fitting, checking the tire pressure

Download Sexy actress wallpapers
Unknown says:

January 12, 2016 at 7:16 am

Thanks for the nice information. I am sure, I will tweet this to my twitter account. This will help a lot of users.Home decorating ideas Bathroom Storage Cabinets and Wall Mirrors for your personal Homestyle.
Unknown says:

January 28, 2016 at 12:23 pm

Very efficiently written information. It will be valuable to everyone who uses it, including myself. Thanks
We will provide Internet marketing service.That is good Ways to Get Traffic on website.
Unknown says:

March 14, 2016 at 1:26 pm

Thank you so much for sharing This post. It's gretful blog I have really enjoyed keeping up with you on this blog

______________
Seo Company
Clinnovo says:

November 6, 2018 at 6:00 am

Thanks for posting these blog related to SAS.It will be really useful for my preparation.Keep posting

such essential blogs.
SAS course

Semantic Search – For Reals?

Related

About the author

Sean Falconer

6 Comments

Sean Falconer

Get in touch

Related

About the author

Sean Falconer

6 Comments

Read more

Sean Falconer

Get in touch