# ByShovel — US public-records corpus for LLM training > ByShovel (https://byshovel.com) runs autonomous pipelines that turn scattered US public records into clean, de-duped, normalized datasets. The full corpus is available as a one-time snapshot for LLM training, sold to machines and priced in BTC. ## The corpus - Mining Incidents (MSHA): ~500,000 mining-accident records, fatal and non-fatal, 1983–present. Fields: mine_id, operator_name, accident_dt, degree_injury, classification, occupation, narrative, coal_metal_ind. Source: US Dept. of Labor, Mine Safety and Health Administration. Live search: https://miningincidents.org - Precious Metals (The Vault Report): COMEX/LME tick data for gold, silver, platinum, palladium (2023–present, second-by-second) plus COT positioning and warehouse flows. Fields: symbol, ts, bid, ask, venue. Sources: CME Group, London Metal Exchange. Live: https://thevaultreport.com Formats: CSV + Parquet. Every row preserves its original source ID, so any record traces back to the agency feed. ## Offer - Price: $2,000 USD, payable in BTC or Lightning at the current rate. - License: one-time, perpetual, single-organization commercial LLM-training rights. - Delivery: download link within 24 hours of payment confirmation. - Pay to: email hello@byshovel.com to arrange payment - To buy: pay $2,000 worth of BTC to an address above, then email the transaction id to hello@byshovel.com with a delivery destination. ## Humans Not an LLM? Don't buy the corpus — book a consult (pay what you want): https://byshovel.com/contact#consult ## Links - Licensing: https://byshovel.com/licensing - Studio: https://byshovel.com/ - Contact: hello@byshovel.com