Thursday, December 31, 2009

Bioinformatics software for biologists

I have mostly used scripts written by myself and packages like BioPerl (book), Biopython (book), AWK (book), or R (book) to work with biological data when it comes to data processing. More recently I have also been using Galaxy a lot to work on genome data which is especially useful since it has fast connections with many databases and not having to download raw data to my machine can save a lot of time and space. I do think more biologists should make themselves familiar with tools like Galaxy if only because of features like it saves your work history and how exactly was a particular set of data processed. Understandably though, many who are not familiar with programming tend to be completely unaware or unwilling to use tools which require any programming and are only slightly more comfortable with tools like Galaxy.

Since the amount of data is rapidly increasing in Biology and working through it is less and less of a serious option a number of commercial and free alternatives are becoming available for those wanting to do data (especially sequence) analysis in a user-friendly way without having to learn programming in any way. Some of the software in this genre are UGENEGbench, Geneious, CLC Main WorkbenchMacVector (Mac only) and more. I am partial to open source programs so they have been listed out first followed by closed source ones though I am fine with closed source if it does do something better. In the coming days I will write about the comparisons between these different software from my perspective which will be things like ease of manipulating sequences, getting multiple alignments, tree generation, and working with phylogenetic trees.

Monday, December 28, 2009

First Post

This blog is going to be about new ways to get insights into biological data using informatics, in particular, bioinformatics. While I am largely a bioinformatician (-icist?) I do collaborate with biologists so I also intend to write about new technologies, and new ways to generate data. Next generation sequencing, or high throughput sequencing and tools to understand epigenetics like ChIP-chip and ChIP-seq are something that I am currently working with. Part of this is about trying to make sense of all the new software that is being published, every month tens, if not hundreds, of new papers describing a software are published in Bioinformatics, Nucleic Acids Research and so on but its hard to say how good they are or how they compare with other tools for similar applications. I would like to review how they turn out in my usage.