5 years ago, I made a bet. Two bets, actually.
The first was with myself. I bet myself that if I devoted serious time to it, I could become a great technology blogger. It wasn’t an easy bet to make. I knew it would require upending my life at the time. And it did.
The second bet was related to the first. I knew that to become a key tech blogger, I would need a focus. As a relatively new Mac user myself, I decided that focus would be Apple. Yes, I was coming later to the party than some, but Apple was still a company at the time that was scoffed at by many. But drawing from my own experience, I truly felt that the company was on the cusp of changing the world. Again.
This post reminded me of my time wrestling with dissertation project ideas, and the one idea that had me spellbound, but was totally unfeasible. Perturb every known miRNA (up or down, drawbacks aplenty either way) and generate full mRNA and miRNA profiles for each condition.
Putting aside the very real issue of off-target effects, the advantage to this over the Califano method would be a far cleaner signal of what modulating this one molecule does to the genetic makeup of a cell.
Would it be worth it?
Had a labmate come by and ask for help with TCGA Level 3 expression data. Its distributed as one file per sample, which when you have several HUNDRED specimens, isn’t really the kind of thing you can or want to re-format by hand if you want to make comparisons across samples.
“How Hard Can it Be?!” I said. Oh when will I learn…
Two days and several infinite loops later, I present two dainty perl scripts for:
a) Take n many sample files and corral them into one flat file, probes x samples. (TCGA Merge)
b) Take a tab delimited text file with x probes and y samples and turn it into a GenePattern-friendly .gct file. (RXC to GCT)
Comes with no warranty, yadda, yadda. Would love to know if this is useful for anyone else, and remember, WATCH YOUR LINE ENDINGS!