scrapeghost
jamesturk.github.ioSummary
Scrapeghost is an experimental Python library designed for automated web scraping using OpenAI's GPT models. It allows users to extract structured data from HTML without writing page-specific code, by defining a schema for the desired data. The library handles HTML preprocessing, sends data to the GPT API, and performs post-processing and validation on the results.
Features2/13
See allMust Have
2 of 5
Conversational AI
API Access
Safety & Alignment Framework
Fine-Tuning & Custom Models
Enterprise Solutions
Other
0 of 8
Image Generation
Code Generation
Multimodal AI
Research & Publications
Security & Red Teaming
Synthetic Media Provenance
Threat Intelligence Reporting
Global Affairs & Policy
PricingUsage-based
See allGpt-3.5-turbo
- 4,096 token limit
Gpt-3.5-turbo-16k
- 16,384 token limit
Gpt-3.5-turbo-16k
- 16,384 token limit
Gpt-4
- 8,192 token limit
Gpt-4
- 8,192 token limit
Gpt-4-32k
- 32,768 token limit
Gpt-4-32k
- 32,768 token limit
Gpt-3.5-turbo-0613
- 4,096 token limit
Gpt-3.5-turbo-0613
- 4,096 token limit
Gpt-3.5-turbo-16k-0613
- 16,384 token limit
Gpt-3.5-turbo-16k-0613
- 16,384 token limit
Rationale
Scrapeghost is an experimental library that leverages OpenAI's GPT API for automated web scraping. It explicitly states its reliance on the OpenAI API and uses GPT models for extracting structured data from HTML, which aligns with the 'API Access' and 'Conversational AI' features (as GPT models are foundational for conversational AI, even if used here for scraping). While it doesn't directly offer a conversational AI interface, its core functionality is built upon the same underlying models. It does not offer enterprise solutions, fine-tuning, or a safety framework directly, but rather uses the OpenAI API which provides these features.
