Beyond the neighbourhood

An important question when analyzing any kind of data is whether the data is representative of that which it is meant to describe. The validity of the conclusions drawn from the analysis of star size distribution discussed in earler posts (here, here and here) thus hinges on whether the more than 100,000 stars in the HYG database are representative of stars in general. In this post I show that this does not seem to be the case.

One way to get insight to this question is to plot radius vs. distance for the stars in the HYG database. I did so using a log-log-scale and applying the same color to each star as in the Hertzsprung-Russel diagram.

This plot seems to suggest that larger stars are more common at greater distances, whereas smaller stars are only found in our stellar neighbourhood.

With a plot like this, however, any kind of star should appear at greater density the further one looks to the right, for two reasons:

  • The additional volume of space included per unit of distance grows as the square of the distance. This is because the volume of a sphere is $V = \frac{4\pi}{3} R^3$, so the volume $\delta V$ of a thin spherical shell (thickness $\delta R \ll R$) is approximately $\delta V \approx 4 \pi R^2 \delta R$. The density of stars per unit of distance is therefore also $\sim R^2$.
  • The log scale means that each step on the actual axis corresponds to an exponentially larger step in distance.
The result of these two effects is that the apparent density of stars grows dramatically as one moves to the right in the plot.

Whereas this might explain the apparent increasing density of large stars, the density of smaller stars seems to exhibit the opposite behaviour, suggesting that smaller stars are much more common in our stellar neighbourhood than further away. This does not seem natural; one would expect the same proportion of smalller to larger stars in any region.

To investigate this I plotted the distribution of stars by size for different distance intervals, each interval representing the same volume of space as a sphere with a radius of 20 parsec. This avoids the compression effect as one moves along the axis that appeared in the plot above.

Note that the distance intervals corresponding to each plot grow shorter when the distance is increased. This is again beacuse of the shell volume effect discussed above; the volume of a spherical shell with inner and outer radius $r$ and $R$, and thickness $\delta R = R-r \ll R$, is $V=\frac{4\pi}{3}\left(R^3 - r^3\right) \approx \frac{4\pi}{3} R^2 \delta R$. Thus, the thickness of shells of constant volume decrease as $\delta R \sim \frac{1}{R^2}$.

In the plot we see that the nearer the distance interval, the greater the number of smaller stars found in the volume, as seen by the left-hand sides of each plot, which differ quite a bit from each other. Conversely, the number of larger stars seems consistent across the intervals, as evinced by the similarity between the right-hand sides of each plot. This shows that the smaller the distance, the greater is the density of faint stars seen, whereas the density of bright stars remains constant.

The explanation for this is fairly simple: faint stars are diffcult to spot at great distances, and the nearer the region under investigation, the easier it will be to find faint stars. This leads to an under-representation of faint stars at greater distances, and thus in the dataset as a whole.

This explanation can be refined by looking at a luminostity vs distance plot like the one shown here:

This plot is similar to one shown at the beginning of this post. For distances smaller than about 25 parsec (shown with the dashed white line) we see stars with any luminosity, down to the faintest star at $L \approx 3\cdot 10^{-6} \, L_{\odot}$. For greater distances, however, the minimum luminostiy increases with distance. The lower bound of luminostities at a given distance is not clearly defined, but seems to approximately follow a straight line in the plot. In a log-log-plot like this, straight lines correspond to power laws.

In keeping with the explanation for the absense of small stars at greater distances given above, we might surmise that it is only possible to see stars appearing brighter than a certain threshold. If that is the case then a power law as the one suggested by the plot is to be expected. The apparent brightnes of a star is given by the visual magnitude of the star, and the luminostiy corresponding to a given visual magnitude increases with distance squared. To see this, consider the expressions for the absolute (bolometric) magnitude $M$ and the visual magnitude $m$: \begin{align} M &= M_{\odot} - 2.5\cdot\log\left(\frac{L}{L_{\odot}}\right)\,,\\ m &= M + 5\cdot \log(d) - 5\,. \end{align}

These can be combined to obtain \begin{align} \log\left(\frac{L}{L_{\odot}}\right) &= \frac{M_{\odot} - M}{2.5}\\ &= \frac{M_{\odot} - m + 5\cdot \log(d) - 5}{2.5}\\ &= \frac{M_{\odot} - m - 5}{2.5} + 2\cdot \log(d) \,, \end{align} or \begin{align} \frac{L}{L_{\odot}} &= 10^{\frac{M_{\odot} - m - 5}{2.5}} \cdot d^2\,. \end{align}

This means that a straight line in this diagram with the equation $\frac{L}{L_{\odot}} = a\cdot d^2$ corresponds to a constant visual magnitude of $m = -2.5 \cdot\log(a) + M_{\odot} - 5$, where $M_{\odot} = 4.74$ is the bolometric magnitude of the Sun.

Such a function is fitted to a set of points representing the 1st permille (1000-quantile) of the luminostiy distribution along a series of distance intervals. The 1st permille is used to account for the fuzzyness of the lower limit to luminosities. The best fit, shown in the plot as the dashed red line, is obtained with $a = 2.1318\cdot 10^{-5}$. This number translates to a visual magnitude of $m = 11.4$, which is the visual magnitude of stars along the dashed red line. It is thus an approximate upper bound on the visual magnitude, or equivalently, a lower bound on the apparent brightness of distant stars in the dataset.

This means that for distances greater than about 25 parsecs, the dataset contains (almost) only stars with a brightness above this lower bound. This can be explained partially by the difficulty of observing such faint objects, and partially as a cutoff imposed to limit the number of entries in the database. Again, the number of objects at a given distance is expected to grow as the distance squared, and so, if it were even possible, including every object at larger distances would make the dataset rather immense.

At any rate, the lack of faint stars at greater distances is hardly a natural feature, as there would be no reason to expect the existence of faint stars be limited to our own stellar neighbourhood. However, this means that bright stars are over-represented in the database. One should keep this in mind when interpreting the distribution of star sizes obtained earlier - faint (and thus small) stars are more common in the Universe than my analysis indicated.

We can of course correct this by limiting ourselves to stars within a distance of e.g. 20 parsec. The distribution of these is shown here:

It is clear from this that the Sun is actually larger than most stars in the 20-parsec neighbourhood. This was not revealed in the ealier analysis, because brighter/lighter stars are over-represented in the dataset as a whole. It seems reasonable to suppose that a similar conclusion would be arrived at for broader regions of space, given that representative data on stars in these regions were available.

The total number of stars within 20 parsec is about 1500, compared to about 108,000 stars (with known distances) in the whole dataset. The cost of obtaining a presumably more representative sample is thus a much smaller sample size.

Comments

Popular Posts