The Petal tool is a decentralized platform that runs a large language model like Bloom-176b. It is capable of loading small parts of the model to run inference and fine-tuning. Single-batch inference takes approximately 1 second per step (token) and can run parallel inference up to hundreds of tokens/sec. This tool offers more than just a classic language model API, using fine-tune sampling methods and allowing users to execute custom paths and see hidden states. Petal also offers flexible PyTorch API. It is part of the BigScience research workshop project.