As I've been exploring the brave new world of AI developments, I keep coming across different tools and platforms for the different types of AI. I'm talking about categorizing by Text to Video models, text-based LLMs, Image to Video, etc.
I started keeping a list in my Notes app, but everything changes so damn fast in this space that I couldn't keep up!
I'll keep updating this page with what "the community" considers the latest and greatest as far as I know.
By this, I mean cloud providers where you can run various models without needing to set them up locally. I find these tools useful if I just want to test something out quickly, but sometimes it can be limiting if the model you are looking for is not available.
Anytime I want to quickly test out a new model I read about, I usually head to Replicate.
My favourite features for this site are:
they offer a UI for running lots of models through a simple form for prompts and settings
they have an API available, and plenty of existing SDKs if I want to run tests in bulk from my local machine (still without having to install models locally)
pricing is very low. I can run any experiments I desire and it never amounts to more than a dollar, usually much less.
I also like that there are some features which I haven't used yet, but I could foresee being very useful if I wanted to run models at any sort of scale.
Namely:
Ability to run any models on various hardware, so I can control the cost <> speed tradeoff.
Ability to run any private models which you upload yourself.
Sometimes a project requires a realistic-sounding voiceover which you don't want to record yourself.
If cost is no problem, then Elevenlabs is probably your best option.
It excels in having:
the most realistic voices, sounds most human-like
widest range of language options
ability to upload your own voice
The main downside has been cost, only if you exceed the generous Free plan. Elevenlabs has a monthly subscription pricing model, so if your needs don't quite fit within one of these plans then it isn't the most cost-effective.
I usually use Polly as a fallback.
The voices aren't as realistic and the range of languages can be limiting, but I love their usage-based pricing - I pay for only what I use.
I like Hailuo for both image-to-video and text-to-video, finding their output to be the most realistic. I would use them more if I could get access to the API, but this currently seems to be kept behind an application process 😥
Runway offers a host of models which are suitable for generating longer videos, but I find them best for generating mini animated scenes from a reference image.
Here's an example image I uploaded:
And here's the output animation:
The prompt I used was:
Revealing an image from a complete blank white paper, transitioning into the style of seamless ink brushstrokes gradually. The image forms organically, with each element revealing itself in a continuous, graceful transition.
I've heard good things about Kling, but unfortunately I haven't been able to play with it much - it seems to be frequently unavailable due to high usage 😟