Beginning of API Access Changes
With Reddit's announcement they will be restricting access to their API, we may be seeing the beginning of more sites restricting API access. By restricting access companies like Reddit will be able to protect which models can be trained on their massive amounts of textual, visual, and contextual data. Additionally, by restricting APIs large community sites like Reddit may be better able to monetize access to their APIs; especially when those APIs can deliver access to vast volumes of data from which to train new generative models.
How might this impact your business
Will Reddit's stance be embraced by other large community sites? Time will tell. And, being prepared for these changes either as an API creator or consumer will better prepare you for the rapidly changing technology environment.
New Monetization Opportunities
With the growing interest (and prevalence?) of generative AI systems there will continue to be opportunities for sites which have
-a large amount of training data (text,images, conversations, etc) OR -a smaller but highly specific amount of training data (highly specialized domains such as Finance, Capital Markets, Healthcare, etc)
to monetize access to their potential training data.
Differentiators in Access and Delivery
When thinking of these new monetization opportunities and the new use case of providing access to others for model training it will be important to consider new opportunities within your platform/site to provide differentiation.
- Initial Discovery: Build into the API a mechanism for prospective buyers to quickly discover features of the data available via your API. For instance which data is provided (text, video, still images, etc) and useful statistics(token count, disk/stream size, real-time vs static vs batch, etc) around each. We recommend that this is programatically build into the product to:
- Reduce Sales Friction: Prospects and Buyers can quickly determine if your dataset is useful in their model training.
- Decrease Stale Data: Often with APIs Marketing or Sales will make ad hoc requests for these descriptive statistics and include them in a presentation. This runs the great risk of being stale or causing delays in providing deal closing info to the team. By building it into the API internal and external users can quickly update their marketing, sales or internal assets to ensure the most accurate and effective picture of your API's value is being put forth.
- Privacy Protections
- Data De-identification: Build your API such that it only provides de-identified (anonyomized if possible) data. This could help in reducing legal and user perception risk. This will also better prepare your company for any future regulatory structures that may become relevant as it relates to AI training data.
- User Options: Provide your users, clients, or participants the ability to explicitly opt-out of their data being included in any API access. The risk involved in this option is that too many users opt for this protection thereby reducing the usability of the data for training. The upsides are that you may be able to grow market share from users because you provide options for their privacy. Additionally, there may become a regulatory mandate requiring this as LLMs become even more pervasive and your company will have been a leader by implementing these controls early on in the cycle.
Need help designing your API? Assessing your data's value in the new AI paradigm? Contact us today and we will work with you to align (or extend) your existing capabilities into this rapidly changing environment of generative AI.