This must be pre cookies influx.
It would be fun to also build a table of parents for each strain. Then one could potentially find parents that tend to have certain characteristics in the offspring.
What program are you using to run that?
Keep it coming!
Ronzo
That is a tool for accessing SQL Server data, called SSMS. I will be building a couple basic web pages where everyone can run some basic queries like that one.
Thank you, I appreciate it.
Ronzo
Yeah, I started collecting the data when I was dead set of starting a big genetics company, but you know, I dream bigger than my pocket book, so after 5-6 years with this list, Iām sharing it.
My initial goal was to breed for things like super high THCV, find THCa strains that fell under delta-9 regs here in NY, and building the most complete set of terps in a staple cultivar.
Now, I realize that requires many growers, and give them a tool and we are breed more intentionally. Yes, data is from older databases, pre-cookies, pre-inflated test results, and should be relatively honest data.
The data started very nasty, with concentrate tests ruining the flower test calculations, and weird stuff, and that took a lot to resolve, removing around 8k results in that process. When looking at it, you may see pheno #ās and selections, which is cool.
I tried removing tests for trim, prerolls, hash, wax, crumble, etc.
There may be some things still left for improvement, but I find it useful when asking queries to an AI trained on the data
Looking at this screenshot, THC calc may need to be shifted by a decimal, maybe highest THC was actually 29% at that time, not 2.9%
I will go ahead and shift the decimal.over one on the thc scores, that does sound right
Yeah, I have been referencing the delta 9 thc column as being more accurate for that. I did a query for any results over 40%, thinking anything above that would be left over concentrate results.
So, my AI queries are something like this:
Give me the top 10 samples that are high in both delta-9 thc, Ocimene, and Limoneneā¦
-or-
What 20 samples are high in delta-9 thc, and have the largest number of total compoundsā¦
-or-
What 2 samples when combined give the largest spread of unique compoundsā¦
I ended up downloading Zing Data, and it has a mobile app, and can upload the dataset via G-sheet link or CSV, both work. After installing the app, in the free account, you can run all the queries using AI natural language (no SQL knowledge needed)
Here we go!
I added the DB to an AI Assistant, and embedded to a webpage to be tested out by the community.
The goal is to stage improvements of data, add recent data, and make the tool more robust.
Thanks everyone for the support, letās grow this and make a useful tool.
Thank you for setting this up.
Interesting that Hawaiian Snow has 9.67% CBGA
I was not expecting that.
Awesome, this works well. And it is quick as well.
of the outdoor strains you said will grow outdoors in southern Alberta, which has the shortest grow time and highest THC?
Tangie B#555
38.87
Thatās coolā¦
I asked it for the 'top ten most common terpenesā
and it chewed on it for a bit and barfed out the numbersā¦ all neatly organized and that. was. itā¦ okā¦
āadd the namesā and nowā¦ a perfect reply
I need to beat on this some moreā¦
Cheers
G
I got the same result using your quoted stuff, itās pretty cool.
Is there a way for this interface to post relevant links and maybe pictures?
Ronzo
best plant says - urban poison, highest terpene content = durban poison x nl
We can append any data, such as relevant links, etc.
I would like to have it direct to a AI search for seed banks that carry those seeds (not just sponsors or listed vendors, but anyone anywhere, small or large). Or possibly āmentionsā, since many heirlooms donāt exist in seed banks, can reference possibly an Instagram account that tagged it or similar. Bridging the gap on sourcing your medicine.
I am suspicious of that result. There are possibly still some rows that are actually concentrates, and that may be one of them. Itās hard to understand what each column represents, but the delta-9 THC-A column in that row is 68.34, which would mean total thc of around 65%. Since @MrPanda filtered out concentrate rows based on delta-9 THC only, this row would not be excluded. It may be more effective to filter out rows based on total cannabinoids < 40%.
Or perhaps those specific cannabinoid columns are in mg/g instead of %, in which case it would mean .967% CBG, which seems reasonable.
I have a plant that has up to 18% CBGA but it was made in a University lab.