Can I get a dump of Hooktheory data for research?

This post came in from a user named Bas and I want to make sure it made it to the forum

Dear Dave/Hooktheory’s support team,

First of all, I want to thank you for the great work you have put into Hooktheory! I have enjoyed using both Hookpad and Theorytab for personal music projects, and they have been a very valuable resource.

I’m writing you to inquire if you are open to sharing the data that Hooktheory has accumulated over the years, for research purposes. A brief search on the forums showed me that you have done this in the past, and I am hoping you would consider doing this again.

About me and what I would use the data for: I am pursuing a master’s degree in Artificial Intelligence, and exploring options for a thesis/research project on the interface of AI and music. I do not have a specific research question in mind yet, but I think the data could be valuable for automatic pattern/structure discovery, chord extraction, harmonic analysis and more. I would

In short: what is your current stance on sharing hooktheory’s dataset for research purposes?

1 Like

Hi Bas,

We are supportive of the idea of using our data for research. We created an API that exposes the chord progression data that powers our Trends / search by chord progression page that you are welcome to explore.

We very much hope to fully open our data for research in the future (and provide more related support), but right now we are 100% focused on products. The data exposed by the API is all that is available as of 2015.03.20.

If things change, we will update this thread.


Is a full data exposure option no longer being considered for this project?

Chris, the API you mentioned is not what the user asked for. Given that you are supportive of sharing the database, why not open it up now?

Currently access through the api is what we are able to make available at this time. We aren’t necessarily opposed to making more features available at some point and have added additional endpoints over time, but we’re not currently making the entire raw database available for download.


I’m not sure what ‘access through the api’ means. The API you’re referring to is not song data, it is an analysis of the data - not what were asking for.
This isn’t a feature request either. Over the years, you guys responded to askers that you are ‘not opposed to sharing the complete data’, followed by some reasoning or other for not making it available.
I’m wondering if this is in bad faith, considering this is data you’ve received from thousands of people in your community who put their time into it, and considering you’ve said ‘we are not opposed’ yet provided no access to any of the song data (again, not referring to the ‘trends’ api).
This feels closer to attempts at sweeping this under the rug.