The Grand Ol' Database by MrPanda - open to OG

In 2019, I was working at Advanced Nutes, and managing the IT, Digital Marketing, and Data Science / Analytics teams, and I was trying to find an edge to basically breed ahead of the curve. This still maybe a dumb idea, but I think it holds merit in a scientific way.

I hired a smart young gal who had little background in Cannabis, but was a genius with data, and she clued me in that she heard that an Open Cannabis Project was closing it’s database and her python programming class, she used that database for a thesis project and if I wanted the data before it was gone, I should code a scraper and retrieve that data. Well, I did. Then I found several other databases from US labs, and did the same. After a few days, I had over 20k results all in different formats. I spent weeks cleaning up the data, and making it useful. I wrote new columns that helped me calculate important factors (ratios of THC:CBD, terp %, Cannabinoid %, etc). I then aligned all the data to one uniform datasheet on G-Drive.

I have intended for the last 5 years to turn this into a visualization platform, with AI and ML to help growers align cultivars that will breed with intentional outcomes (new medicines, resistance traits due to specific terps, terp + cannabinoid combos for cancers, etc). I am willing to open up this work outside of my own G-Drive to anyone else who wants to work along this route to breeding, as I feel it holds the most value for my time personally, to be able to heal others and myself.

Here’s a snippet of what I’m talking about:

I really want this to play a part in some projects, as I think the science speaks and is worth a shot to use this to nail a trait, while keeping gene pools more open than a tight inbred selfing project. Maybe by finding that Ocimene slayer in US and mash it to that Ocimene slayer in South Africa, it’ll produce heterosis on the genes that produce Ocimene where as inbreeding would dilute the gene possibly. IDK, looking for people with smarter ideas and more experience.

CURRENTLY THE “BIG LIST” HAS 12,880 real test results (after removing unusable data) from the last decade. The labs that the data is from are in Southern California, and Colorado + Open Cannabis Project original data. I am open to adding datasets if it’s good data, lmk.

Here’s how you can get involved…

  1. Want access to view the list and sort it as you wish for your own intentional breeding? No prob, lmk and I can give you viewing access.
  2. Work at a lab, know a lab manager, or have a dataset? Let’s chat so I can verify the data will work and it can be added successfully.
  3. Want to support the project? Tell anyone that you know who could benefit from the “Big List” and spread the word!

Thanks OG! Let’s give the tools to the people so we can all breed the medicines that work for us and our loved ones.

Panda

63 Likes

:nerd_face:
@Mithridate
@FieldEffect

5 Likes

@Mithridate @FieldEffect

Public link to the dataset:

21 Likes

Thank you for sharing this.

My service has tests on everything for the last 5 years or so. This is California. That would even it out maybe?

I Can’t really trust these services but I see what you mean where at least it would give an idea or the terps etc.

4 Likes

Yeah, it’s just to get ideas. It was all scraped from public data that isn’t available anymore, and I have no control over how accurate, but I figure it’s somewhat accurate since it was only 3 sources so far.

5 Likes

Wow that’s awesome. @Crafty_Flame you may be interested in this

1 Like

Thanks for sharing. I’ll check this out. There might be a cool service - like an AI Chat Bot - we can spin up that combines this data with whats available on the market for personalized recommendations. :beers:

2 Likes

Great Idea! I can spin up a OpenAI chat assistant based on the data.

Also, going to feed it into BigML or similar to get a list of patterns (like if alpha pinene cancels out linalool or something like that consistently hidden in the data)

8 Likes

$19/mo, testing it before I commit

8 Likes

Haha, boom! The AI speaks

16 Likes

Well Super Lemon Haze just got bumped up in priority. Thanks for this. I expect you may be able to find others here could contribute to your test results database. :slight_smile:

6 Likes

I may not have much to contribute but being able to view that information for myself would be fum to look into and educational!

If I could have access that would be appreciated :pray:

3 Likes

Well Done @MrPanda!

I’ve been involved in data-scraping and reorganizing disparate database schema myself and I can appreciate the effort that you must have put into this. There are plenty of smart and capable growers here who will understand what you’ve created.

I’ll drop you a direct msg with a few specific questions on how I might benefit from your work on this project. I have an RIL strain looking for outbreeding opportunities.

Frankie’s Daughters: Unpacking a Frozen Genome

-Grouchy :v: :green_heart:

7 Likes

if you click on the link posted, there is a download button so you can keep it locally.

3 Likes

I can be wrong, so sorry in advance … But in the table in my opinion there is no accuracy.
Cannot percentage “Total Cannabinoid” be smaller than the percentage “Total Thc” …
Since the table value is written in the form of formulas, this is not accuracy will cause a cascade of errors in other values.
In no case do I criticize, just it seems to me …

Thank you @MrPanda , it’s a grand work that is difficult to evaluate! Very cool!

1 Like

Judging from the formulas, that column should be more accurately titled “Total Non-THC Cannabinoid”.

2 Likes

I would like to that data in to SQL Server, where we can run traditional queries. Hit me up @MrPanda if interested. Can put it in probably today or tomorrow. Maybe we surface some queries to a GUI.

4 Likes

@BigF Sounds awesome, lmk what you need from me, a download CSV?

2 Likes

Yeah, an excel file or CSV would work. EDIT: Nevermind, am downloading the Excel now. Ab out 12k records, so it sounds complete. Man that is some sexy data too!

2 Likes

Its imported, and aw shit son! This is SEXY. Can run queries, for example:
What are the top 10 beta Caryophyllene strains?

I need to build us a little GUI so folks can go in there and run these kind of queries.

11 Likes