It is good to consistently test your own limits. Here is a running list of limits in machine learning engineering.

Data imbalance

If your dataset is imbalanced, machine learning algorithms can trick you by always predicting the majority class. If you are telling cats from dogs, but you have only 1 cat image and 999 dog images in your dataset, the machine learning model can achieve a 99.9% accuracy by always predicting a dog.

There’s nothing you can do with your model itself except for giving your minority datapoints more weight, generate effective synthetic data for the minority class (e.g. with SMOTE), collect more data, or sabotage your data-collecting efforts by downsampling your majority class.

When data imbalance occurs, think of this: there is always a real-world reason you end up collecting imbalanced data. We live in a world with both the common and the rarer things. Build a baseline model by downsampling the majority class training data and start researching other tricks.

Develop a better curve-fitting metric than MAE/MSE

The only occasion where you can find a better metric is dynamic time warping (DTW) for time-series data. This is not really a better “metric”, because DTW is in essence a numerical trick to make MAE/MSE pattern-match the small shifts along the time axis.

Generative networks, object detection systems, recommendation systems compose several metrics together. For instance, in an object detection system, the metric would be both whether the object's coordinate is correct and whether the size of the object is close enough. Combined metrics make these systems finicky to train. ML algorithms are hill-climbing algorithms, looking for that flat point on top of the hill. If you stack two hills together, there can be more "flat" points before you reach the top.

Find out when a numerical algorithm has converged

This is a rare problem and this is a problem with a close fix. You can detect possible convergence but you cannot distinguish a true convergence from a quasi-convergence.