data analysis – cryoshon musings and analysis

How to Ask A Good Scientific Question

One of the first tasks a scientist or curious person must undertake before experimentation is the formulation and positing of a scientific question. A scientific question is an extremely narrow question about reality which can be answered directly and specifically by data. Scientists pose scientific questions about obscure aspects of reality with the intent of discovering the answer via experimentation. After experimentation, the results of the experiment are compared with their most current explanation of reality, which will then be adjusted if necessary. In the laboratory, the original scientific question will likely take many complicated experiments and deep attention paid before it is answered.

For everyone else, the scientific question and experimental response is much more rudimentary: if you have ever wondered what the weather was like and then stepped outside to see for yourself, you have asked a very simple and broad scientific question and followed up with an equally simple experiment. Experiments render data, which is used to adjust the hypothesis, the working model that explains reality: upon stepping outside, you may realize that it is cold, which supports your conception of the current time being winter.

Of course, a truly scientific hypothesis will seek to explain the ultimate cause as well as the proximate cause, but we’ll get into what that means later. For now, let’s investigate the concept of the hypothesis a little bit more so that we can understand the role of the scientific question a bit better.

Informally, we all carry countless hypotheses around in our head, though we don’t call them that and almost never consider them as models of reality that are informed by experimentation because of how natural the scientific process is to us. The hypotheses we are most familiar with are not even mentioned explicitly, though we rely on them deeply; our internal model of the world states that if we drop something, it will fall.

This simple hypothesis was likely formed early on in childhood, and was found to be correct over the course of many impromptu experiments where items were dropped and then were observed to fall. When our hypotheses are proven wrong by experimentation, our response is surprise, followed by a revision of the hypothesis in a way that accounts for the exception. Science at its most abstract is the continual revision of hypotheses after encountering surprising data points.

If we drop a tennis ball onto a hard floor, it will fall– then bounce back upward, gently violating our hypothesis that things will fall when dropped. Broadly speaking, our model of reality is still correct: the tennis ball does indeed fall when dropped, but we failed to account for the ball bouncing back upward, so we have to revise our hypothesis to explain the bounce. Once we have dropped the tennis ball a few more times to ensure that the first time was not a fluke, we may then adjust our hypothesis to include the possibility that some items, such as tennis balls, will bounce back up before falling again.

Of course, this hypothesis adjustment regarding tennis balls is quite naive, as it assigns the property of bouncing to certain objects rather than to a generalized phenomena of object motion and collision. The ultimate objective of the scientific process is to resolve vague hypotheses into perfect models of the world which can account for every possible state of affairs.

Hypotheses are vague and broad when first formed. Violations of the broad statements allow for clarification of the hypothesis and add detail to the model. As experiments continue to fill in the details of the hypothesis, our knowledge of reality deepens. Once our understanding of reality reaches a high enough level, we can propose matured hypotheses that can actually predict the way that reality will behave under certain conditions– this is one of the holy grails of scientific inquiry. Importantly, a prediction about the state of reality is just another type of scientific question. There is a critical caveat which I have not yet discussed, however.

Hypotheses must be testable by experimentation in order to be scientific. We will also say that hypotheses must be falsifiable. If the hypothesis states that the tennis ball bounces because of magic, it is not scientific or scientifically useful because there is no conceivable experiment which will tell us that “magic” is not the cause. We cannot interrogate more detail out of the concept of “magic” because it is immutable and mysterious by default.

Rather than filling in holes in our understanding of why tennis balls bounce, introducing the concept of magic as an explanation merely forces us to re-state the original question, “how does a tennis ball bouncing work?” In other words, introducing the concept of “magic” does not help us to add details which explain the phenomena of tennis balls bouncing, and ends up returning us to a search for more details. In general, hypotheses are better served by only introducing new concepts or terminology when necessary to label the relation of previously established data points to each other. The same could be said for the coining of a new term.

Now that we are on the same page regarding the purpose of scientific questions– adding detail to hypotheses by testing their statements– we can get into the guts of actually posing them. It’s okay if the scientific question is broad at first, so long as increasing levels of understanding allow for more specific inquiry. The best way to practice asking a basic scientific question is to imagine a physical phenomenon that fascinates you, then ask how it works and why. Answering the scientific question “why” is usually performed by catching up with previously performed research. Answering “how” will likely involve the same, although it may encounter the limit of human knowledge and require new experimentation to know definitively. I am fascinated by my dog’s penchant for heavily shedding hair. Why does my dog shed so much hair, and how does she know when to shed?

There are actually a number of scientific questions here, and we must isolate them from each other and identify the most abstract question we have first. We look for the most abstract question first in order to give a sort of conceptual location for our inquiry; once we know what the largest headline of our topic is, we know where on the paper we can try to squint and resolve the fine print. In actual practice, finding the most abstract question directs us to the proper body of already performed research.

Our most abstract question will always start with “why”. Answering “why” will always require a more comprehensive understanding of general instances that govern the phenomena in question, whereas “what” or “how” typically refers to an understanding that is limited to a fewer instances. So, our most abstract question here is, “Why does my dog shed so much?”

A complete scientific explanation of why the dog sheds will include a subsection which describes how the dog knows when to shed. Generally speaking, asking “why” brings you to the larger and more comprehensively established hypothesis, whereas asking “how” brings you to the more narrow, less detailed, and more mechanistic hypothesis. Answering new questions of “why” in a scientific fashion will require answering many questions of “how” and synthesizing the results. When our previously held understanding of why is completely up-ended by some new explanation of how, we call it a scientific revolution.

At this point in human history, for every question we can have about the physical world, there is already a general hypothesis which our scientific questions will fall under. This is why it is important to orient our more specific scientific questions of “how” properly; we don’t want to be looking for our answer in the wrong place. In this case, we can say that dogs shed in order to regulate their temperature.

Temperature regulation is an already established general hypothesis which falls under the even more general hypothesis of homeostasis. So, when we ask how does the dog know when to shed, we understand that whatever the mechanistic details may be, the result of the sum of these details will be homeostasis of the dog via regulated temperature.

Understanding the integration between scientific whys and hows is a core concept in asking a good scientific question. Now that we have clarified the general “why” by catching up with previously established research, let’s think about our question of “how” for a moment. What level of detail are we looking for? Do we want to know about the hair shedding of dogs at the molecular level, the population level, or something in between? Once we decide, we should clarify our question accordingly to ensure that we conduct the proper experiment or look for the proper information.

When we clarify our scientific question, we need to phrase it in a way such that the information we are asking for is specific. A good way of doing this is simply rephrasing the question to ask for detailed information. Instead of asking, “how does the dog know when to shed”, ask, “what is the mechanism that causes dogs to shed at some times and not others.”

Asking for the mechanism means that you are asking for a detailed factual account. Indicating that you are interested in the aspect of the mechanism that makes dogs shed at some times but not other times clarifies the exact aspect of the mechanism of shedding that you are interested in. Asking “what is” can be the more precise way of asking “how.”

The question of the mechanism of shedding timing would be resolved even further into even more specific questions of sub-mechanisms if we were in the laboratory. Typically, scientific questions beget more scientific questions as details are uncovered by experiments which attempt to answer the original question.

As it turns out, we know from previous research that dog shedding periods are regulated by day length, which influences melatonin levels, which influences the hair growth cycle. Keen observers will note that there are many unstated scientific questions which filled in the details where I simplified using the word “influences”.

Now that you have an example of how to work through a proper scientific question from hypothesis to request for details, try it out for yourself. Asking a chain of scientific questions and researching the answers is one of the best ways to develop a sense of wonder for the complexity of our universe!

I hope you enjoyed this article, I’ve wanted to get these thoughts onto paper for quite a long time, and I assume I’ll revisit various portions of this piece later on because of how critical it is. If you want more content like this, check out my Twitter @cryoshon and my Patreon!

How To Write Systematically in 11.5 bites

After a few years of working in biomedical research and a philosophy degree from college, I know a few things about writing and thinking systematically. Unfortunately, I see a lot of people stumbling in their writing when they try to create complex abstract or technical materials– writing is tough, and accurate, succinct, detailed, and logical writing is even harder.

To me, systematic writing is a method of writing which seeks to transmute the complex relationships between raw or parsed data into a coherent, readable narrative that can be effectively understood and analyzed by someone who is generally knowledgeable on the topic, but who didn’t gather or prepare the data. Systematic writing is part of a greater family of writing that includes scientific writing, technical writing, and financial writing, along with other types I probably haven’t even thought of.

While this definition may seem overly abstract, I’d like to point out that most of our received and sent communications are not systematic; a news anchor is not relaying systematically prepared information to the public, even though the reporters have gone through the trouble of parsing raw data (events that happened) into a narrative (what the anchor says). The quantity of technical detail and data referencing in a news report is slim, as news reports are designed for a very wide audience who have little previous context for the event that happened (the data). An email we send to a colleague referencing data or analysis is not necessarily systematic writing, as it’s entirely possible for a certain context to be inferred between two people; systematic writing provides its own context and content explicitly to the audience.

Systematic writing is typically intended for a small, already-savvy audience, and should only offer the minimal viable context. A reader with general knowledge on the topic of the piece should be able to acquaint himself with a systematically written piece in short order, but a layman should not, because establishing the amount of context required for a layman would involve a lot of background information which falls outside of the scope of a particular instance of systematic writing. We don’t want our systematic writing to sprawl, because systematic writing is intensely purposeful and detail-heavy writing, and lots of background information and tangents dilute the factual details we’re trying to communicate.

So, the title promises 11.5 bites describing the process of writing systematically, and without further ado here’s a primer on how to write and think systematically:

Define your goal. What kind of narrative do you want to make, and what data are you planning on using? Who is going to read the report, and how much context will be required?
Put on your white thinking hat. To use the terminology of the fantastic thought guide Six Thinking Hats, the white thinking hat is purely unbiased and factual thinking used for establishing a common ground among readers. If you’re going to be writing a systematic document which refers to data, you need to make sure that you don’t take any liberties with the data without explicitly qualifying them as speculation or partially supported. No spin!
Assemble your data. You can’t write systematically without having data. Ensure that your data is collated/parsed/charted in a non-deceptive and easy to understand way– the only person you’re trying to inform at this step is yourself, so it behooves you to be honest about the quality of your data and what knowledge we can actually extract in analysis. If there are computations or manipulations required of your data, now is the time to do them.
Determine the limits of what your data can tell you. Soon, we’ll analyze our data, but first, we need to vaccinate ourselves against narrative mistakes. Though it seems simple, it’s easy to slip up and attribute facts to your data that aren’t actually there. Explicitly state the variables which your data depicts (sales, months). Remember that going forward, all of your statements should be in terms of the variables which you outline here. If you’re not talking about information within the purview the data that your variables describe, you’re not being systematic.
Extract verbal information from your data. Write down simple statements to these effects, such as, “the data for November showed 42 sales.” If you computed averages or other values in your data assembly step, now is the time to introduce it as a simple phrase. If you expect that handling the data in this way will be confusing, document your process simply and clearly so that your audience will understand. Do not introduce any explanation at this point, merely state what the data say, and, if necessary, state how the data were processed. Remember not to speculate, the point of this step is to establish purely factual statements.
Analyze your data at a basic level. Now that you have a series of simple statements depicting your data in an unbiased way, comparisons between data statements can begin. Are the sales from November higher than the sales from October? Write that comparison down if it’s relevant to your originally stated goal, and make sure to directly reference the values in your new synthesis statements. The point of this step is to explicitly state simple relationships of the data, independent of any narrative.
Analyze your data deeply. Stay focused on your original goal during this step. What questions can your impartial data statements answer explicitly? Implicitly? What trends in your data are noteworthy? What points of data are outliers? Can you explain the outliers? In this step, writing more complex statements is necessary. “The sales data from November (42 sales) are higher than October (30 sales), following the upward trend of the fall season. These data tell us that the fall season is our strongest selling period, despite the high sales in December.” Don’t try to speculate or hypothesize about “why” yet, just tease out the more complex relationships in your data, and write them down in a clear way. As always, reference your data directly in order to build context for your audience and keep them on the same page. Don’t worry about over-analyzing at this point, we’ll prune our findings later.
Ask Why. Why did we see the data that we saw in our analysis? What are the general principles governing our data? Address each piece of relevant data with this question, and ensure to answer it briefly. The outliers that were previously identified need special attention at this point. Keep explanations of your data concise and factual, though remember that your explanations are not actually within your data set, so you should draw in outside proof to support your explanations if necessary. It’s okay to hypothesize if you don’t know exactly why certain data turned out the way that they did, but be sure to explicitly label speculation.
Build a narrative using your data, analyses, and explanation. Consider your starting goal, and how to marshal the data, analyses, and explanations in order to accomplish that goal. Your narrative should proceed first with the data, then with a simple factual explanation of the data, then with a more complex analysis of the data, and finish off with an explanation of the data if it’s required. The narrative step of systematic writing is where you put all of the pieces together and put it into one attractive package for your audience. Don’t neglect graceful segways between different portions of the data set. The final product of this step can be considered a first draft of your systematic writing effort, and may take the form of a PowerPoint presentation, meeting agenda, technical report, or formal paper.
Anticipate questions and comments from your audience. Look for areas in which your explanation, analysis, or data prompt a response, and plan accordingly. Questions regarding your narrative are typically the easiest to address by clarifying what you’ve already written explaining why your data appears the way it does. Questions regarding your analysis can get a bit technical depending on the audience, and so you should be prepared to refer back to the source data in your responses. Questions regarding the data itself or the parsing of the data are the most difficult; typically, the outliers will be under the most scrutiny, and their data quality may be called into question. I find that it helps to get out in front of questions regarding outliers, addressing them to your audience before taking questions.
Prune non-critical information. This is the step where most of the data-statements and analysis statements meet their demise. Which analyses, explanations, and narrative elements aren’t strictly serving your original goal? Remove extraneous information to create a hardened product. Ensure that the relevant context and core data analysis remains, and don’t build a misleading narrative by omitting contradictory relevant data.

The final half-step is, of course, crossing the t’s and dotting the i’s for your final draft– and make sure it’s perfect! A missed detail on something not mission-critical will still distract your audience from your data and analysis.

I hope that my readers have a better idea of how to write and perhaps think systematically after reading this piece. I think that many non-technical people struggle with systematic writing because of how data-centric it is; communicating in the style of referencing data and withholding speculation can be quite difficult for people accustomed to relating written concepts intuitively and emotionally.

If you have any questions, leave em’ in the comments and I’ll respond. I know that the 21st century will have the highest demand yet for systematic thinkers and writers, so I’m also considering forming a consultancy in order to help organizations with training their employees and executives to think and communicate in systematic ways, so expect more on topics like this in the future.

As always, follow me on twitter @cryoshon, re-post my articles to social media, and subscribe to the mailing list on the right!

	1 – How to Be a Good… on How to Be A Good or Bad I…
	2 – Why the Sharing… on Why the Sharing Economy is…
	enkiv2 on How to Survive Late Capitalism…
	2 – How to Survive L… on How to Survive Late Capitalism…
	nicko0326 on A Response to Paul Graham…